Re: CPU utilization

From: Randall Stewart <rrs_at_cisco.com>
Date: Thu, 12 Apr 2007 07:51:42 -0400
Robert:

A few comments .. in line :-D

Robert Watson wrote:
> 
> On Thu, 12 Apr 2007, Randall Stewart wrote:
> 
>> I have probably an old question that has been asked.. but here goes 
>> anyway.
>>
>> I have three machines.
>>
>> 1) stewart - running 7.0 (2.8gig p4 dual core)
>> 2) bsd1    - running 7.0 (2.8gig Xeon Hyperthreaded)
>> 3) bsd2    - running 6.2 (2.4gig Xeon Hyperthreaded)
>>
>> Now if I run tests that max out cpu (at least I think they do).. I see 
>> <1> or <2> drag down to 1% idle/ even 0 %idle.
>>
>> However <3> never drops below 50% idle.. it preforms a lot slower 
>> too.. which I expect since it is somewhat of an older processor.. but 
>> in running say top -S
> 
> It strikes me that there are two possibilities here, and it could be 
> both are true:
> 
> (1) In 7.x, there are scheduling and accounting changes that could 
> result in
>     both better utilization and different measurement.
> 
> (2) In 7.x, certain debugging features default to on (WITNESS, INVARIANTS,
>     user space malloc debugging) that add significant (!) overhead.

The first thing I do when I go to 7.0 is go edit out the malloc debug
:-D.. and I know these machines do NOT have witness and invarients
on.. I have a separate build that I use for that one :-D
> 
> I'd confirm first that (2) isn't the cause of the change -- make sure 
> you have a kernel without debugging features turned on, and change the 
> man page on malloc.conf to make sure user debugging is turned off for 
> malloc.  Then let's revisit (1).
>
I have some interesting results here that I think indicate something..
not sure.... when playing with the hyperthreading switches :-D

Having hyperthreading OFF on the sender side (this is an SCTP test)
and having hyperthreading ON on the receiver side seem to
give me the best performance.

When the 7.0 machine is the sender and the 6.2 machine the
receiver in this config I get 930Mb (user data) on my gig link..
Thats pretty good :-D

When I turn hyperthreading on in this layout for the sender we
drop to 600Mb.

Now reversing it.. the difference is not so dramatic. Having
hyperthreading OFF on the sender (the 2.4 Gig 6.2 machine) and
hyperthreading ON on the 2.8G 7.0 machine I see about 790Mb
having hyperthreading on the sender as well we drop to
around 690Nb.

Also interestingly if fast machine is receiving, with no
hyper theading.. I see around the same performance
as the above.. 790Mb

that does not happen when the 6.2/slow machine is the
receiver.. I see 780Mb vs the 930Mb

Soo.. I am thinking a couple of things about the
SCTP code...

a) I must have a lock contention issue on the sending
    side.

b) The receiver side code does not have this issue and
    appears to work well with the hyperthreading..

I need to also go check out what is going on with H-T in
the 7.0 as the sender and turn on mutex_profiling.. this
may confirm my thoughts on this :-D

R


-- 
Randall Stewart
NSSTG - Cisco Systems Inc.
803-345-0369 <or> 803-317-4952 (cell)
Received on Thu Apr 12 2007 - 09:49:00 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:08 UTC