[FlowScan] cpu utilisation on flowscan processes

Ross Tsolakidis Ross.Tsolakidis at day3.com.au
Mon Feb 25 17:16:07 CST 2008


Forgot to add:

So I should never see another flowscan process ?
3449 ?        Ss     0:01 /usr/local/netflow/bin/flow-capture -w
/var/netflow/ft 0/0/2055 -S
 3453 pts/0    S      0:20 /usr/bin/perl bin/flowscan
 3458 pts/0    R     10:09 /usr/bin/perl bin/flowscan
So the above should never happen unless I am starting it again ?

I start the processes like this at boot time:
lrwxrwxrwx  1 root root   21 2007-10-31 16:12 S96flowcapture ->
../init.d/flowcapture
lrwxrwxrwx  1 root root   18 2007-10-31 16:12 S97flowscan ->
../init.d/flowscan

==

d3-syd-equinix01:/etc/rc2.d# cat  ../init.d/flowcapture
#!/bin/sh
# description: Start Flow-Capture
# chkconfig: 2345 95 00

case "$1" in
'start')

/usr/local/netflow/bin/flow-capture -w /var/netflow/ft 0/0/2055 -S5 -V5
-E1G -n 287 -N 0 -R /usr/local/netflow/bin/linkme
touch /var/lock/startflows
;;
'stop')

killall -9 /usr/local/netflow/bin/flow-capture
rm -f /var/lock/startflows
;;

*)

echo "Usage: $0 { start | stop }"
;;

esac
exit 0


==

d3-syd-equinix01:/etc/rc2.d# cat  ../init.d/flowscan
#!/bin/sh
# description: Start Flowscan
# chkconfig: 2345 99 00
 

case "$1" in
'start')
        cd /var/netflow/ ; bin/flowscan >>/var/log/flowscan 2>&1
</dev/null & >/dev/null
        touch /var/lock/flowscan.1
        ;;
'stop')
        killall -9 'flowscan'
        rm -f /var/lock/flowscan.1
        ;;
*)
        echo "Usage: $0 { start | stop }"
        ;;
esac
exit 0

==


Regards,
Ross.


-----Original Message-----
From: Ross Tsolakidis 
Sent: Tuesday, 26 February 2008 10:11 AM
To: 'plonka at doit.wisc.edu'; flowscan at lists.wiscnet.net
Subject: RE: [FlowScan] cpu utilisation on flowscan processes

Hi Dave,

>>Did you schedule flowscan in cron or something? I've never seen anyone
>>do that... I suggest starting it once, via rc script set to launch at
>>boot time. (Those sample rc scripts come with flowscan in the rc/
>>sub-directory...)

I thought the same thing.
So what I did is kill the flowscan process and waited 1 hour to see if
any other processes were being launched.
Unfortunately no, it's not in cron as no other processes appeared after
I killed the original flowscan process.


>>> Flow-capture collects Netflow raw data from the Routers.
>>> Flowscan is launched every 5 mins to process this data from the tmp
file created from the above process.

>>No! FlowScan does not process the tmp file - it processed the ft-*
>>files (in timestamp order) *after* flow-capture renames them.
>>If yours is processing the tmp- file then your FlowFileGlob is
>>configured wrong in flowscan.cf (When using flowscan w/flow-tools
>>it should say something like "FlowFileGlob ft-*[0-9]").

Ok, I've checked the flowscan.cf file, looks fine.
# flowscan Configuration Directives
############################################

# FlowFileGlob (REQUIRED)
# use this glob (file pattern match) when looking for raw flow files to
be
# processed, e.g.:
# FlowFileGlob /var/local/flows/flows.*:*[0-9]
##FlowFileGlob flows.*:*[0-9]
FlowFileGlob /var/netflow/ft-v05.*

# ReportClasses (REQUIRED)
# a comma-seperated list of FlowScan report classes, e.g.:
# ReportClasses CampusIO
# ReportClasses SubNetIO
ReportClasses CUFlow

# WaitSeconds (OPTIONAL)
# This should be <= the "-s" value passed on the command-line to cflowd,
e.g.:
# WaitSeconds 300
WaitSeconds 30

# Verbose (OPTIONAL, non-zero = true)
Verbose 1


Question:
Basically flowscan looks for:
FlowFileGlob /var/netflow/ft-v05.*

What happens once it processes them ?
They are still there, could it be that flowscan is launching a process
for each of the ft files ?
That would explain the constant increase of processes when I delete all
the ft files for testing then re-launch flowscan ?

I also created a saved dir under the ft dir, but nothing is appearing in
here.

d3-syd-equinix01:/var/netflow# ls -al
total 376
drwxr-xr-x   6 root root  12288 2008-02-26 10:05 .
drwxr-xr-x  15 root root   4096 2007-10-31 15:54 ..
drwxr-xr-x   2 root root   4096 2008-02-26 09:55 bin
drwxr-xr-x   3 root root   4096 2008-02-26 10:05 ft
drwxr-xr-x   3 root root 352256 2008-02-26 10:08 rrds
drwxr-xr-x 122 root root   4096 2008-02-26 10:05 scoreboard

d3-syd-equinix01:/var/netflow/ft# ls -al
total 3360
drwxr-xr-x 3 root root    4096 2008-02-26 10:05 .
drwxr-xr-x 6 root root   12288 2008-02-26 10:05 ..
-rw-r--r-- 1 root root  412258 2008-02-26 10:00
ft-v05.2008-02-26.095718+1100
-rw-r--r-- 1 root root  705908 2008-02-26 10:05
ft-v05.2008-02-26.100000+1100
-rw-r--r-- 1 root root 1734788 2008-02-26 09:57
ft-v05.2008-02-26.194501+1100
drwxr-xr-x 2 root root    4096 2008-02-26 09:42 saved
-rw-r--r-- 1 root root  540764 2008-02-26 10:08
tmp-v05.2008-02-26.100500+1100
d3-syd-equinix01:/var/netflow/ft#

d3-syd-equinix01:/var/netflow/ft/saved# ls -al
total 8
drwxr-xr-x 2 root root 4096 2008-02-26 09:42 .
drwxr-xr-x 3 root root 4096 2008-02-26 10:05 ..
d3-syd-equinix01:/var/netflow/ft/saved#


Here we go again.
 3449 ?        Ss     0:01 /usr/local/netflow/bin/flow-capture -w
/var/netflow/ft 0/0/2055 -S
 3453 pts/0    S      0:20 /usr/bin/perl bin/flowscan
 3458 pts/0    R     10:09 /usr/bin/perl bin/flowscan
 3469 pts/0    R      7:09 /usr/bin/perl bin/flowscan
 3495 pts/0    R      2:47 /usr/bin/perl bin/flowscan


Any other ideas ?

For a laugh ;)
Got LOAD ?

top - 07:17:53 up 19:23,  2 users,  load average: 65.78, 65.02, 63.38
Tasks: 150 total,  65 running,  85 sleeping,   0 stopped,   0 zombie
Cpu0  : 99.0%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  1.0%si,
0.0%st

Cpu1  : 99.0%us,  1.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,
0.0%st
Mem:   1036428k total,  1024164k used,    12264k free,     5444k buffers
Swap:  5301368k total,   391384k used,  4909984k free,    76680k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+   PPID P
SWAP   TIME COMMAND
 7838 root      18   0 28228  17m 1000 R   24  1.7   5:20.23     1 0
9m   5:20 flowscan
 7893 root      18   0 28232  17m 1008 D   19  1.7   4:52.16     1 1
10m   4:52 flowscan
 1824 root      18   0 28232  17m  984 R   13  1.8  18:50.10     1 0
9868  18:50 flowscan
32145 root      18   0 28516  18m  984 R   10  1.8  14:57.65     1 1
9648  14:57 flowscan
 7114 root      18   0 28228  17m  984 R   10  1.7  12:07.85     1 0
9.9m  12:07 flowscan
 7547 root      18   0 28228  17m  984 R   10  1.7   7:39.93     1 0
9.9m   7:39 flowscan
 8208 root      18   0 28228  17m 1000 R   10  1.7   2:59.57     1 1
10m   2:59 flowscan
 8361 root      18   0 28228  17m 1012 R    9  1.7   2:18.22     1 1
9m   2:18 flowscan
 6922 root      18   0 28232  17m  984 D    8  1.7   9:58.69     1 1
10m   9:58 flowscan
 8180 root      18   0 28232  17m 1000 R    8  1.7   3:29.79     1 1
10m   3:29 flowscan
 8327 root      18   0 28232  17m 1004 R    8  1.7   2:16.36     1 1
10m   2:16 flowscan
 6796 root      18   0 28228  17m  984 R    7  1.7   9:42.14     1 0
10m   9:42 flowscan
 7652 root      18   0 28524  17m  992 R    7  1.7   5:52.40     1 0
10m   5:52 flowscan
 8341 root      18   0 28228  17m 1000 R    7  1.7   2:17.70     1 0
10m   2:17 flowscan
 8418 root      18   0 28232  17m 1004 R    7  1.7   1:59.96     1 0
10m   1:59 flowscan
 7277 root      18   0 28228  17m  984 R    6  1.7   7:04.63     1 0
10m   7:04 flowscan
 7770 root      18   0 28232  17m  984 R    6  1.7   5:37.58     1 0
10m   5:37 flowscan
 8383 root      18   0 28228  17m 1000 R    6  1.7   2:04.62     1 0
10m   2:04 flowscan
 8666 root      18   0 28228  17m 1116 R    6  1.7   0:23.39     1 0
9.9m   0:23 flowscan
 8691 root      18   0 28228  17m 1140 R    5  1.8   0:15.31     1 0
9.8m   0:15 flowscan
 6912 root      18   0 28228  16m  984 R    4  1.7   9:08.43     1 1
10m   9:08 flowscan
 7790 root      18   0 28228  17m  992 R    4  1.7   5:32.02     1 0
10m   5:32 flowscan
 7058 root      18   0 28228  17m  984 R    2  1.7   9:35.15     1 0
10m   9:35 flowscan
 7307 root      18   0 28228  17m  984 R    2  1.7   6:57.43     1 1
10m   6:57 flowscan
 8130 root      18   0 28228  17m 1000 R    2  1.7   3:38.12     1 0
10m   3:38 flowscan
 2289 bind      25   0 75500  27m 1252 S    1  2.7   7:04.16     1 0
45m   7:04 named
 7905 root      18   0 28228  17m 1000 R    1  1.7   4:36.73     1 0
10m   4:36 flowscan
 7921 root      18   0 28232  17m 1008 R    1  1.7   4:28.58     1 0
10m   4:28 flowscan
 8233 root      18   0 28228  17m 1000 R    1  1.8   3:46.63     1 0
9896   3:46 flowscan
 8593 root      18   0 28228  17m 1088 R    1  1.8   0:29.11     1 0
9.8m   0:29 flowscan
 8746 root      15   0  2356 1180  856 R    1  0.1   0:00.31  8726 1
1176   0:00 top
    1 root      15   0  1944  416  392 S    0  0.0   0:01.10     0 1
1528   0:01 init
    2 root      RT   0     0    0    0 S    0  0.0   0:00.00     1 0
0   0:00 migration/0
    3 root      34  19     0    0    0 R    0  0.0   0:00.01     1 0
0   0:00 ksoftirqd/0
    4 root      RT   0     0    0    0 S    0  0.0   0:00.00     1 1
0   0:00 migration/1
    5 root      34  19     0    0    0 R    0  0.0   0:00.00     1 1
0   0:00 ksoftirqd/1
    6 root      10  -5     0    0    0 S    0  0.0   0:00.02     1 0
0   0:00 events/0


Regards,
Ross.




From: Dave Plonka [mailto:plonka at doit.wisc.edu] 
Sent: Tuesday, 26 February 2008 3:12 AM
To: flowscan at lists.wiscnet.net
Cc: Ross Tsolakidis
Subject: Re: [FlowScan] cpu utilisation on flowscan processes


Hi Ross,

[I removed the flow-tools lists, I don't think this is
flowscan-specific.]

I think I know what's going on... lots of details below...

>From what you've posted, you're initial performance results look good,
but then it starts taking longer and longer to process a 5 minute flow
file.

On Mon, Feb 25, 2008 at 01:52:07PM +1100, Ross Tsolakidis wrote:
> Just following on from this, I'm trying to narrow down where the issue
is.
> 
> This is my understanding of how this all works.
> 
> ==
> Flow-capture collects Netflow raw data from the Routers.
> Flowscan is launched every 5 mins to process this data from the tmp
file created from the above process.

No! FlowScan does not process the tmp file - it processed the ft-*
files (in timestamp order) *after* flow-capture renames them.
If yours is processing the tmp- file then your FlowFileGlob is
configured wrong in flowscan.cf (When using flowscan w/flow-tools
it should say something like "FlowFileGlob ft-*[0-9]").

Also, flowscan runs continuously (like a daemon)... after it finishes
processing the flow files that exist, it goes to sleep the number of
seconds you've configured before checking for new files (i.e. newly
renamed from tmp- to ft-) that should appear every 5 minutes.

> Here's an example of what I'm seeing in the logs.
> 2008/02/22 15:25:31 flowscan-1.020 CUFlow: Cflow::find took 4
wallclock secs ( 3.69 usr + 0.00 sys = 3.69 CPU) for 642543 flow file
bytes, flow hit ratio: 34353/35700
<snip>

OK, at this point things look great. flowscan got 3.69 seconds of
CPU in 4 wallclock seconds, and processed almost 9000 flows per second.

> As you can see it's taking longer and longer (wallclock secs).
> 
> It will eventually get to this:
> 008/02/23 17:31:18 flowscan-1.020 CUFlow: Cflow::find took 254
wallclock secs ( 9.08 usr + 0.01 sys = 9.09 CPU) for 922132 flow file
bytes, flow hit ratio: 49702/51390
> 2008/02/23 17:35:39 flowscan-1.020 CUFlow: Cflow::find took 237
wallclock secs (11.15 usr + 0.00 sys = 11.15 CPU) for 1073584 flow file
bytes, flow hit ratio: 60328/62400
<snip>
> Very ugly.

Agreed. That's bad - performance has dropped to only 263 flows per
second.

OK, two thinkgs I notice, these files are larger, but not a lot larger.
Look at your walclock time vs. the CPU time... clearly the flowscan
process is not getting the CPU. For instance, above, we see 237
wallclock
seconds, but flowscan only got 11.15 seconds of CPU during that time.

It seems like FlowScan is competing with other processes.
It should not have to do this - that is, the only thing that should
be running on this server is flowscan.

> >From my understanding, using this example.
> 2008/02/22 16:20:38 flowscan-1.020 CUFlow: Cflow::find took 11
wallclock secs ( 5.82 usr + 0.00 sys = 5.82 CPU) for 626396 flow file
bytes, flow hit ratio: 32176/33780
> The flowscan process should be finished and be terminated.

No, it should keep running if WaitSeconds in flowscan.cf is non-zero.
If -w or WaitSeconds is non-zero (as I always use) flowscan stays
running, waits n seconds, then looks for new files to process.

> How does something taking ~10secs end up getting longer and longer ?

Perhaps you're starting your flowscan process over and over
while there is already one running? They're probably all bound
up competing for the CPU and disk.

> I did a test on the secondary machine up there.
> Killed flow-capture and flowscan.
> Deleted all files in the /ft folder.
> 
> d3-syd-equinix01:/var/netflow/ft# ls -al
> total 792
> drwxr-xr-x 2 root root 794624 2008-02-25 13:38 .
> drwxr-xr-x 6 root root 12288 2008-02-25 13:37 ..
> d3-syd-equinix01:/var/netflow/ft#
> 
> Then started flow-capture.
> 
> 20241 ? Ss 0:00 /usr/local/netflow/bin/flow-capture -w /var/netflow/ft
0/0/2055 -S5 -V5 -E1G -n 287 -N 0 -R /usr/local/netflow/bin/linkme
> 
> d3-syd-equinix01:/var/netflow/ft# ls -al
> total 848
> drwxr-xr-x 2 root root 794624 2008-02-25 13:40 .
> drwxr-xr-x 6 root root 12288 2008-02-25 13:37 ..
> -rw-r--r-- 1 root root 49244 2008-02-25 13:40
tmp-v05.2008-02-25.134041+1100
> d3-syd-equinix01:/var/netflow/ft#
> 
> Now started flowscan.
> 
> 20241 ? Ss 0:00 /usr/local/netflow/bin/flow-capture -w /var/netflow/ft
0/0/2055 -S5 -V5 -E1G -n 287 -N 0 -R /usr/local/netflow/bin/linkme
> 20253 pts/2 S 0:00 /usr/bin/perl bin/flowscan
> 
> sleep 30...
> sleep 30...

OK, so your WaitSeconds is set to 30.

> So far so good.
<snip>
> Now here's the part I don't understand.
> Why is flowscan still running ?
> It finished didn't it ?

see above.

> Now on the next run (10min mark):
> 
> 2008/02/25 13:50:09 working on file
/var/netflow/ft-v05.2008-02-25.134500+1100...
> 2008/02/25 13:50:15 flowscan-1.020 CUFlow: Cflow::find took 6
wallclock secs ( 6.24 usr + 0.01 sys = 6.25 CPU) for 734307 flow file
bytes, flow hit ratio: 40698/43290
> 2008/02/25 13:50:15 flowscan-1.020 CUFlow: report took 0 wallclock
secs ( 0.00 usr 0.01 sys + 0.11 cusr 0.03 csys = 0.15 CPU)
> 
> 20241 ? Ss 0:01 /usr/local/netflow/bin/flow-capture -w /var/netflow/ft
0/0/2055 -S5 -V5 -E1G -n 287 -N 0 -R /usr/local/netflow/bin/linkme
> 20253 pts/2 S 0:10 /usr/bin/perl bin/flowscan
> 20265 pts/2 R 4:34 /usr/bin/perl bin/flowscan
> 20274 pts/2 R 0:35 /usr/bin/perl bin/flowscan
> 
> Can anyone shed some light on this ?

Did you schedule flowscan in cron or something? I've never seen anyone
do that... I suggest starting it once, via rc script set to launch at
boot time. (Those sample rc scripts come with flowscan in the rc/
sub-directory...)

Dave

-- 
plonka at doit.wisc.edu http://net.doit.wisc.edu/~plonka/ Madison, WI

________________________________________
Total Control Panel
Login

To: ross.tsolakidis at day3.com.au
From: plonka at doit.wisc.edu

Message Score: 50
High (60): Pass
My Spam Blocking Level: Medium
Medium (75): Pass

Low (90): Pass
Block this sender / Block this sender enterprise-wide

Block doit.wisc.edu / Block doit.wisc.edu enterprise-wide



This message was delivered because the content filter score did not
exceed your filter level.



DISCLAIMER: 
This e-mail and any files transmitted with it may be privileged and confidential, and are intended only for the use of the intended recipient. If you are not the intended recipient or responsible for delivering this e-mail to the intended recipient, any use, dissemination, forwarding, printing or copying of this e-mail and any attachments is strictly prohibited. 
If you have received this e-mail in error, please REPLY TO the SENDER to advise the error AND then DELETE the e-mail from your system.
Any views expressed in this e-mail and any files transmitted with it are those of the individual sender, except where the sender specifically states them to be the views of our organisation.
Our organisation does not represent or warrant that the attached files are free from computer viruses or other defects. 
26/2/2008
The user assumes all responsibility for any loss or damage resulting directly or indirectly from the use of the attached files. In any event, the liability to our organisation is limited to either the resupply of the attached files or the cost of having the attached files resupplied.




More information about the flowscan mailing list