Some interesting plots

From: Randall Stewart <rrs_at_cisco.com>
Date: Fri, 29 Sep 2006 13:29:37 -0400
All:

As you all may know I have been working on getting SCTP
into Current..

I have been, of late, trying to tweak things to get
the BEST performance out of the implementation...

I have moved my testing off between two 7.0 machines.. same
code base on each updated Sep 25.

One is a 2.8Gig Dell SC1600 Xeon.. (hyper threaded CPU).
The other is a P4D (2.8Gig .. slightly faster, true dual
                      processor machine).

They are connected by two intel EM server cards like so:


+----+                                 +----+
   1  | em1 <---------------------> em0 | 2
      | em0 <-----locallan--------> msk0|
      | dc0 <-Direct Inet               |
+----+                                 +-----+


em1 has 10.1.2.12                   em0 10.1.2.21
em0 has 10.1.1.12                  msk0 10.1.1.21


Now I setup the association with a server on 1, the
conversation goes something like this:

                                             gettimeofday()
      <---------------Send me N bytes in X byte records----
     ------>send---->
     ------>send---->                          when N bytes are recieved
                                                   gettimeofday()/close()
     ...
     close()
                                             calculate and announce b/w

Now the SCTP implemenation has some interesting logging things that
allow me to turn on various compile options that log into a 80,000 entry
memory buffer that you can pull with specific commands.

I have rigged the sender to log each sack in and tell
on the sender side (1):

a) How much data is queued to the peer
b) Total flight size
c) rwnd advertised
d) the calulated rwnd.

On the receiver side (2) on both exit and entry
to the receiving function in the kernel:
On entry:
a) read request size (uio_resid)
b) amount in queue on the sb to read
c) if blocking is allowed
d) How much has to be read total to get a window
    update sack out
On exit:
a) how much has been read since the last sack sent
b) how much is left un-read
c) the rwnd we are advertising
d) how much is still on the sb.

------------------------------------------------------------

The logging is nicely light weight (as long as you don't roll the
log). This is because I use some atomic functions to move forward
so I have picked a size that won't roll it...

I have then played a bit with GNU plot to generate some
graphs of all this... fun fun ;-D

Now I have placed the plots at:

http://www.sctp.org/plots.tar.bz2

(they are about 2Meg.. and I have a 512k up link .. so patience is
  a must).

So what will you see..

plots that say *plotall.ps give you the entire transfer.

ones that say

plot*.ps show everything:

a) When things are read in red y axis is size unread
b) what the sender has in flight in green
c) what the a-rwnd is in blue
d) send-q size in pink.

There is always a plot for each 100ms so you
can "zoom" in on things...
Especially 1.4-1.5 seconds look interesting..

Now plots that have
cntplot*

Show

a) flight
b) arwnd
c) outq
and
d) number of sacks every 400us (multiplied by 1000 so
    if you had 10 sacks in a 400us period you would show
    it at 40000).

This too is interesting.

Finally plots that have lcntplot*
show only the flight size vs
the number of sack's every 500us (multiplied
by 1000)...

The interesting thing that I have going on is:

1) The sender is running dry.. you can see this in the graphs
    when the flight crashes down to almost nothing at say
    1.5 seconds or so... also seen whenever the blue line
    moves down to the green flight..

2) Any time the send queue (blue line) is at the flight (green)
    this is bad since it means we have the danger of running dry.

3) The reader is completely keeping up... since it
    reads constantly and never has much of a sb_cc (the
    red line stays on the bottom of the page).

Now whats strange about this is that I am only getting 760Megbits per
second (or so).. and in that 760Mb I see processor <1> about running
60-70% idle.. and processor <2> running 70-80% idle.

Now I am most puzzeled on how to make this run faster...

a) what causes the spikes (ideas would be welcome)
b) why can't I get more CPU.. I have seperate locks that
    the receiver and the sender use to make it so there is
    minimal contention between the network side of things and
    the queuing and or reading... In fact I just added the sender
    seperate lock in the last day or so.. This is because
    I was seeing not 2 or so but 8 dips before I did that in
    the "queue"

One other note, I see TCP is only getting 250Meg or so on the
same test (It can run either).. now it used to get close to
the full pipe (Gigbit).. so is there some issue with the new code
that was recently submitted?

Have a look.. its interesting.. I am rather clueless
though on how to grab that CPU that top tells me is
there (if its accurate :-D) suggestions on where to look
or tuning I could do would be welcome.

Thanks


R
-- 
Randall Stewart
NSSTG - Cisco Systems Inc.
803-345-0369 <or> 815-342-5222 (cell)
Received on Fri Sep 29 2006 - 15:30:26 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:00 UTC