[FlowScan] cpu utilisation on flowscan processes
Ross Tsolakidis
Ross.Tsolakidis at day3.com.au
Mon Feb 25 17:16:07 CST 2008
Forgot to add:
So I should never see another flowscan process ?
3449 ? Ss 0:01 /usr/local/netflow/bin/flow-capture -w
/var/netflow/ft 0/0/2055 -S
3453 pts/0 S 0:20 /usr/bin/perl bin/flowscan
3458 pts/0 R 10:09 /usr/bin/perl bin/flowscan
So the above should never happen unless I am starting it again ?
I start the processes like this at boot time:
lrwxrwxrwx 1 root root 21 2007-10-31 16:12 S96flowcapture ->
../init.d/flowcapture
lrwxrwxrwx 1 root root 18 2007-10-31 16:12 S97flowscan ->
../init.d/flowscan
==
d3-syd-equinix01:/etc/rc2.d# cat ../init.d/flowcapture
#!/bin/sh
# description: Start Flow-Capture
# chkconfig: 2345 95 00
case "$1" in
'start')
/usr/local/netflow/bin/flow-capture -w /var/netflow/ft 0/0/2055 -S5 -V5
-E1G -n 287 -N 0 -R /usr/local/netflow/bin/linkme
touch /var/lock/startflows
;;
'stop')
killall -9 /usr/local/netflow/bin/flow-capture
rm -f /var/lock/startflows
;;
*)
echo "Usage: $0 { start | stop }"
;;
esac
exit 0
==
d3-syd-equinix01:/etc/rc2.d# cat ../init.d/flowscan
#!/bin/sh
# description: Start Flowscan
# chkconfig: 2345 99 00
case "$1" in
'start')
cd /var/netflow/ ; bin/flowscan >>/var/log/flowscan 2>&1
</dev/null & >/dev/null
touch /var/lock/flowscan.1
;;
'stop')
killall -9 'flowscan'
rm -f /var/lock/flowscan.1
;;
*)
echo "Usage: $0 { start | stop }"
;;
esac
exit 0
==
Regards,
Ross.
-----Original Message-----
From: Ross Tsolakidis
Sent: Tuesday, 26 February 2008 10:11 AM
To: 'plonka at doit.wisc.edu'; flowscan at lists.wiscnet.net
Subject: RE: [FlowScan] cpu utilisation on flowscan processes
Hi Dave,
>>Did you schedule flowscan in cron or something? I've never seen anyone
>>do that... I suggest starting it once, via rc script set to launch at
>>boot time. (Those sample rc scripts come with flowscan in the rc/
>>sub-directory...)
I thought the same thing.
So what I did is kill the flowscan process and waited 1 hour to see if
any other processes were being launched.
Unfortunately no, it's not in cron as no other processes appeared after
I killed the original flowscan process.
>>> Flow-capture collects Netflow raw data from the Routers.
>>> Flowscan is launched every 5 mins to process this data from the tmp
file created from the above process.
>>No! FlowScan does not process the tmp file - it processed the ft-*
>>files (in timestamp order) *after* flow-capture renames them.
>>If yours is processing the tmp- file then your FlowFileGlob is
>>configured wrong in flowscan.cf (When using flowscan w/flow-tools
>>it should say something like "FlowFileGlob ft-*[0-9]").
Ok, I've checked the flowscan.cf file, looks fine.
# flowscan Configuration Directives
############################################
# FlowFileGlob (REQUIRED)
# use this glob (file pattern match) when looking for raw flow files to
be
# processed, e.g.:
# FlowFileGlob /var/local/flows/flows.*:*[0-9]
##FlowFileGlob flows.*:*[0-9]
FlowFileGlob /var/netflow/ft-v05.*
# ReportClasses (REQUIRED)
# a comma-seperated list of FlowScan report classes, e.g.:
# ReportClasses CampusIO
# ReportClasses SubNetIO
ReportClasses CUFlow
# WaitSeconds (OPTIONAL)
# This should be <= the "-s" value passed on the command-line to cflowd,
e.g.:
# WaitSeconds 300
WaitSeconds 30
# Verbose (OPTIONAL, non-zero = true)
Verbose 1
Question:
Basically flowscan looks for:
FlowFileGlob /var/netflow/ft-v05.*
What happens once it processes them ?
They are still there, could it be that flowscan is launching a process
for each of the ft files ?
That would explain the constant increase of processes when I delete all
the ft files for testing then re-launch flowscan ?
I also created a saved dir under the ft dir, but nothing is appearing in
here.
d3-syd-equinix01:/var/netflow# ls -al
total 376
drwxr-xr-x 6 root root 12288 2008-02-26 10:05 .
drwxr-xr-x 15 root root 4096 2007-10-31 15:54 ..
drwxr-xr-x 2 root root 4096 2008-02-26 09:55 bin
drwxr-xr-x 3 root root 4096 2008-02-26 10:05 ft
drwxr-xr-x 3 root root 352256 2008-02-26 10:08 rrds
drwxr-xr-x 122 root root 4096 2008-02-26 10:05 scoreboard
d3-syd-equinix01:/var/netflow/ft# ls -al
total 3360
drwxr-xr-x 3 root root 4096 2008-02-26 10:05 .
drwxr-xr-x 6 root root 12288 2008-02-26 10:05 ..
-rw-r--r-- 1 root root 412258 2008-02-26 10:00
ft-v05.2008-02-26.095718+1100
-rw-r--r-- 1 root root 705908 2008-02-26 10:05
ft-v05.2008-02-26.100000+1100
-rw-r--r-- 1 root root 1734788 2008-02-26 09:57
ft-v05.2008-02-26.194501+1100
drwxr-xr-x 2 root root 4096 2008-02-26 09:42 saved
-rw-r--r-- 1 root root 540764 2008-02-26 10:08
tmp-v05.2008-02-26.100500+1100
d3-syd-equinix01:/var/netflow/ft#
d3-syd-equinix01:/var/netflow/ft/saved# ls -al
total 8
drwxr-xr-x 2 root root 4096 2008-02-26 09:42 .
drwxr-xr-x 3 root root 4096 2008-02-26 10:05 ..
d3-syd-equinix01:/var/netflow/ft/saved#
Here we go again.
3449 ? Ss 0:01 /usr/local/netflow/bin/flow-capture -w
/var/netflow/ft 0/0/2055 -S
3453 pts/0 S 0:20 /usr/bin/perl bin/flowscan
3458 pts/0 R 10:09 /usr/bin/perl bin/flowscan
3469 pts/0 R 7:09 /usr/bin/perl bin/flowscan
3495 pts/0 R 2:47 /usr/bin/perl bin/flowscan
Any other ideas ?
For a laugh ;)
Got LOAD ?
top - 07:17:53 up 19:23, 2 users, load average: 65.78, 65.02, 63.38
Tasks: 150 total, 65 running, 85 sleeping, 0 stopped, 0 zombie
Cpu0 : 99.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 1.0%si,
0.0%st
Cpu1 : 99.0%us, 1.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st
Mem: 1036428k total, 1024164k used, 12264k free, 5444k buffers
Swap: 5301368k total, 391384k used, 4909984k free, 76680k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ PPID P
SWAP TIME COMMAND
7838 root 18 0 28228 17m 1000 R 24 1.7 5:20.23 1 0
9m 5:20 flowscan
7893 root 18 0 28232 17m 1008 D 19 1.7 4:52.16 1 1
10m 4:52 flowscan
1824 root 18 0 28232 17m 984 R 13 1.8 18:50.10 1 0
9868 18:50 flowscan
32145 root 18 0 28516 18m 984 R 10 1.8 14:57.65 1 1
9648 14:57 flowscan
7114 root 18 0 28228 17m 984 R 10 1.7 12:07.85 1 0
9.9m 12:07 flowscan
7547 root 18 0 28228 17m 984 R 10 1.7 7:39.93 1 0
9.9m 7:39 flowscan
8208 root 18 0 28228 17m 1000 R 10 1.7 2:59.57 1 1
10m 2:59 flowscan
8361 root 18 0 28228 17m 1012 R 9 1.7 2:18.22 1 1
9m 2:18 flowscan
6922 root 18 0 28232 17m 984 D 8 1.7 9:58.69 1 1
10m 9:58 flowscan
8180 root 18 0 28232 17m 1000 R 8 1.7 3:29.79 1 1
10m 3:29 flowscan
8327 root 18 0 28232 17m 1004 R 8 1.7 2:16.36 1 1
10m 2:16 flowscan
6796 root 18 0 28228 17m 984 R 7 1.7 9:42.14 1 0
10m 9:42 flowscan
7652 root 18 0 28524 17m 992 R 7 1.7 5:52.40 1 0
10m 5:52 flowscan
8341 root 18 0 28228 17m 1000 R 7 1.7 2:17.70 1 0
10m 2:17 flowscan
8418 root 18 0 28232 17m 1004 R 7 1.7 1:59.96 1 0
10m 1:59 flowscan
7277 root 18 0 28228 17m 984 R 6 1.7 7:04.63 1 0
10m 7:04 flowscan
7770 root 18 0 28232 17m 984 R 6 1.7 5:37.58 1 0
10m 5:37 flowscan
8383 root 18 0 28228 17m 1000 R 6 1.7 2:04.62 1 0
10m 2:04 flowscan
8666 root 18 0 28228 17m 1116 R 6 1.7 0:23.39 1 0
9.9m 0:23 flowscan
8691 root 18 0 28228 17m 1140 R 5 1.8 0:15.31 1 0
9.8m 0:15 flowscan
6912 root 18 0 28228 16m 984 R 4 1.7 9:08.43 1 1
10m 9:08 flowscan
7790 root 18 0 28228 17m 992 R 4 1.7 5:32.02 1 0
10m 5:32 flowscan
7058 root 18 0 28228 17m 984 R 2 1.7 9:35.15 1 0
10m 9:35 flowscan
7307 root 18 0 28228 17m 984 R 2 1.7 6:57.43 1 1
10m 6:57 flowscan
8130 root 18 0 28228 17m 1000 R 2 1.7 3:38.12 1 0
10m 3:38 flowscan
2289 bind 25 0 75500 27m 1252 S 1 2.7 7:04.16 1 0
45m 7:04 named
7905 root 18 0 28228 17m 1000 R 1 1.7 4:36.73 1 0
10m 4:36 flowscan
7921 root 18 0 28232 17m 1008 R 1 1.7 4:28.58 1 0
10m 4:28 flowscan
8233 root 18 0 28228 17m 1000 R 1 1.8 3:46.63 1 0
9896 3:46 flowscan
8593 root 18 0 28228 17m 1088 R 1 1.8 0:29.11 1 0
9.8m 0:29 flowscan
8746 root 15 0 2356 1180 856 R 1 0.1 0:00.31 8726 1
1176 0:00 top
1 root 15 0 1944 416 392 S 0 0.0 0:01.10 0 1
1528 0:01 init
2 root RT 0 0 0 0 S 0 0.0 0:00.00 1 0
0 0:00 migration/0
3 root 34 19 0 0 0 R 0 0.0 0:00.01 1 0
0 0:00 ksoftirqd/0
4 root RT 0 0 0 0 S 0 0.0 0:00.00 1 1
0 0:00 migration/1
5 root 34 19 0 0 0 R 0 0.0 0:00.00 1 1
0 0:00 ksoftirqd/1
6 root 10 -5 0 0 0 S 0 0.0 0:00.02 1 0
0 0:00 events/0
Regards,
Ross.
From: Dave Plonka [mailto:plonka at doit.wisc.edu]
Sent: Tuesday, 26 February 2008 3:12 AM
To: flowscan at lists.wiscnet.net
Cc: Ross Tsolakidis
Subject: Re: [FlowScan] cpu utilisation on flowscan processes
Hi Ross,
[I removed the flow-tools lists, I don't think this is
flowscan-specific.]
I think I know what's going on... lots of details below...
>From what you've posted, you're initial performance results look good,
but then it starts taking longer and longer to process a 5 minute flow
file.
On Mon, Feb 25, 2008 at 01:52:07PM +1100, Ross Tsolakidis wrote:
> Just following on from this, I'm trying to narrow down where the issue
is.
>
> This is my understanding of how this all works.
>
> ==
> Flow-capture collects Netflow raw data from the Routers.
> Flowscan is launched every 5 mins to process this data from the tmp
file created from the above process.
No! FlowScan does not process the tmp file - it processed the ft-*
files (in timestamp order) *after* flow-capture renames them.
If yours is processing the tmp- file then your FlowFileGlob is
configured wrong in flowscan.cf (When using flowscan w/flow-tools
it should say something like "FlowFileGlob ft-*[0-9]").
Also, flowscan runs continuously (like a daemon)... after it finishes
processing the flow files that exist, it goes to sleep the number of
seconds you've configured before checking for new files (i.e. newly
renamed from tmp- to ft-) that should appear every 5 minutes.
> Here's an example of what I'm seeing in the logs.
> 2008/02/22 15:25:31 flowscan-1.020 CUFlow: Cflow::find took 4
wallclock secs ( 3.69 usr + 0.00 sys = 3.69 CPU) for 642543 flow file
bytes, flow hit ratio: 34353/35700
<snip>
OK, at this point things look great. flowscan got 3.69 seconds of
CPU in 4 wallclock seconds, and processed almost 9000 flows per second.
> As you can see it's taking longer and longer (wallclock secs).
>
> It will eventually get to this:
> 008/02/23 17:31:18 flowscan-1.020 CUFlow: Cflow::find took 254
wallclock secs ( 9.08 usr + 0.01 sys = 9.09 CPU) for 922132 flow file
bytes, flow hit ratio: 49702/51390
> 2008/02/23 17:35:39 flowscan-1.020 CUFlow: Cflow::find took 237
wallclock secs (11.15 usr + 0.00 sys = 11.15 CPU) for 1073584 flow file
bytes, flow hit ratio: 60328/62400
<snip>
> Very ugly.
Agreed. That's bad - performance has dropped to only 263 flows per
second.
OK, two thinkgs I notice, these files are larger, but not a lot larger.
Look at your walclock time vs. the CPU time... clearly the flowscan
process is not getting the CPU. For instance, above, we see 237
wallclock
seconds, but flowscan only got 11.15 seconds of CPU during that time.
It seems like FlowScan is competing with other processes.
It should not have to do this - that is, the only thing that should
be running on this server is flowscan.
> >From my understanding, using this example.
> 2008/02/22 16:20:38 flowscan-1.020 CUFlow: Cflow::find took 11
wallclock secs ( 5.82 usr + 0.00 sys = 5.82 CPU) for 626396 flow file
bytes, flow hit ratio: 32176/33780
> The flowscan process should be finished and be terminated.
No, it should keep running if WaitSeconds in flowscan.cf is non-zero.
If -w or WaitSeconds is non-zero (as I always use) flowscan stays
running, waits n seconds, then looks for new files to process.
> How does something taking ~10secs end up getting longer and longer ?
Perhaps you're starting your flowscan process over and over
while there is already one running? They're probably all bound
up competing for the CPU and disk.
> I did a test on the secondary machine up there.
> Killed flow-capture and flowscan.
> Deleted all files in the /ft folder.
>
> d3-syd-equinix01:/var/netflow/ft# ls -al
> total 792
> drwxr-xr-x 2 root root 794624 2008-02-25 13:38 .
> drwxr-xr-x 6 root root 12288 2008-02-25 13:37 ..
> d3-syd-equinix01:/var/netflow/ft#
>
> Then started flow-capture.
>
> 20241 ? Ss 0:00 /usr/local/netflow/bin/flow-capture -w /var/netflow/ft
0/0/2055 -S5 -V5 -E1G -n 287 -N 0 -R /usr/local/netflow/bin/linkme
>
> d3-syd-equinix01:/var/netflow/ft# ls -al
> total 848
> drwxr-xr-x 2 root root 794624 2008-02-25 13:40 .
> drwxr-xr-x 6 root root 12288 2008-02-25 13:37 ..
> -rw-r--r-- 1 root root 49244 2008-02-25 13:40
tmp-v05.2008-02-25.134041+1100
> d3-syd-equinix01:/var/netflow/ft#
>
> Now started flowscan.
>
> 20241 ? Ss 0:00 /usr/local/netflow/bin/flow-capture -w /var/netflow/ft
0/0/2055 -S5 -V5 -E1G -n 287 -N 0 -R /usr/local/netflow/bin/linkme
> 20253 pts/2 S 0:00 /usr/bin/perl bin/flowscan
>
> sleep 30...
> sleep 30...
OK, so your WaitSeconds is set to 30.
> So far so good.
<snip>
> Now here's the part I don't understand.
> Why is flowscan still running ?
> It finished didn't it ?
see above.
> Now on the next run (10min mark):
>
> 2008/02/25 13:50:09 working on file
/var/netflow/ft-v05.2008-02-25.134500+1100...
> 2008/02/25 13:50:15 flowscan-1.020 CUFlow: Cflow::find took 6
wallclock secs ( 6.24 usr + 0.01 sys = 6.25 CPU) for 734307 flow file
bytes, flow hit ratio: 40698/43290
> 2008/02/25 13:50:15 flowscan-1.020 CUFlow: report took 0 wallclock
secs ( 0.00 usr 0.01 sys + 0.11 cusr 0.03 csys = 0.15 CPU)
>
> 20241 ? Ss 0:01 /usr/local/netflow/bin/flow-capture -w /var/netflow/ft
0/0/2055 -S5 -V5 -E1G -n 287 -N 0 -R /usr/local/netflow/bin/linkme
> 20253 pts/2 S 0:10 /usr/bin/perl bin/flowscan
> 20265 pts/2 R 4:34 /usr/bin/perl bin/flowscan
> 20274 pts/2 R 0:35 /usr/bin/perl bin/flowscan
>
> Can anyone shed some light on this ?
Did you schedule flowscan in cron or something? I've never seen anyone
do that... I suggest starting it once, via rc script set to launch at
boot time. (Those sample rc scripts come with flowscan in the rc/
sub-directory...)
Dave
--
plonka at doit.wisc.edu http://net.doit.wisc.edu/~plonka/ Madison, WI
________________________________________
Total Control Panel
Login
To: ross.tsolakidis at day3.com.au
From: plonka at doit.wisc.edu
Message Score: 50
High (60): Pass
My Spam Blocking Level: Medium
Medium (75): Pass
Low (90): Pass
Block this sender / Block this sender enterprise-wide
Block doit.wisc.edu / Block doit.wisc.edu enterprise-wide
This message was delivered because the content filter score did not
exceed your filter level.
DISCLAIMER:
This e-mail and any files transmitted with it may be privileged and confidential, and are intended only for the use of the intended recipient. If you are not the intended recipient or responsible for delivering this e-mail to the intended recipient, any use, dissemination, forwarding, printing or copying of this e-mail and any attachments is strictly prohibited.
If you have received this e-mail in error, please REPLY TO the SENDER to advise the error AND then DELETE the e-mail from your system.
Any views expressed in this e-mail and any files transmitted with it are those of the individual sender, except where the sender specifically states them to be the views of our organisation.
Our organisation does not represent or warrant that the attached files are free from computer viruses or other defects.
26/2/2008
The user assumes all responsibility for any loss or damage resulting directly or indirectly from the use of the attached files. In any event, the liability to our organisation is limited to either the resupply of the attached files or the cost of having the attached files resupplied.
More information about the flowscan
mailing list