On May 1, 2012, at 11:40 , Luigi Rizzo wrote: > On Tue, May 01, 2012 at 10:27:42AM -0400, George Neville-Neil wrote: >> >> On Apr 20, 2012, at 15:03 , Luigi Rizzo wrote: >> >>> Continuing my profiling on network performance, another place >>> were we waste a lot of time is if_ethersubr.c::ether_output() >>> >>> In particular, from the beginning of ether_output() to the >>> final call to ether_output_frame() the code takes slightly >>> more than 210ns on my i7-870 CPU running at 2.93 GHz + TurboBoost. >>> In particular: >>> >>> - the route does not have a MAC address (lle) attached, which causes >>> arpresolve() to be called all the times. This consumes about 100ns. >>> It happens also with locally sourced TCP. >>> Using the flowtable cuts this time down to about 30-40ns >>> >>> - another 100ns is spend to copy the MAC header into the mbuf, >>> and then check whether a local copy should be looped back. >>> Unfortunately the code here is a bit convoluted so the >>> header fields are copied twice, and using memcpy on the >>> individual pieces. >>> >>> Note that all the above happens not just with my udp flooding >>> tests, but also with regular TCP traffic. >> >> Hi Luigi, >> >> I'm really glad you're working on this. I may have missed this in a thread >> but are you tracking these somewhere so we can pick them up and fix them? >> >> Also, how are you doing the measurements. > > The measurements are done with tools/tools/netrate/netsend and > kernel patches to return from sendto() at various places in the > stack (from the syscall entry point down to the device driver). > A patch is attached. You don't really need netmap to run it, > it was just a convenient place to put the variables. > > I am not sure how much we can "fix", there are multiple expensive > functions on the tx path, and probably also on the rx path. > > My hope at least for the tx path is that we can find out a way to install a > "fastpath" handler in the socket. > When there is no handler installed (e.g. on the first packet or > unsupported protocols/interfaces) everything works as usual. Then > when the packet reaches the bottom of the stack, we try to update > the socket with a copy of the headers generated in the process, and > the name of the fastpath function to be called. Next transmissions > will then be able to shortcut the stack and go straight to the > device output routine. > > I don't have data on the receive path or good ideas on how to proceed -- the > advantage of the tx path is that traffic is implicitly classified, > whereas it might not be the case for incoming traffic, and classification > might be the expensive step. > > Hopefully we'll have time to discuss this next week in ottawa. Yes, I think we should. Best, GeorgeReceived on Tue May 01 2012 - 15:34:11 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:26 UTC