On Tue, 8 Apr 2008 13:28:18 +0100 (BST), "Robert Watson" <rwatson_at_FreeBSD.org> said: > > On Tue, 8 Apr 2008, Darren Reed wrote: > > > Is there a performance analysis of the copy vs zerocopy available? (I don't > > see one in the paper, just a "to do" item.) > > > > The numbers I'm interested in seeing are how many Mb/s you can capture > > before you start suffering packet loss. This needs to be done with > > sequenced packets so that you can observe gaps in the sequence captured. > > We've done some analysis, and a couple of companies have the zero-copy > BPF > code deployed. I hope to generate a more detailed analysis before the > developer summit so we can review it at BSDCan. The basic observation is > that > for quite a few types of network links, the win isn't in packet loss per > se, > but in reduced CPU use, freeing up CPU for other activities. There are a > number of sources of win: > > - Reduced system call overhead -- as load increases, # system calls goes > down, > especially if you get a two-CPU pipeline going. > > - Reduced memory access, especially for larger buffer sizes, avoids > filling > the cache twice (first in copyout, then again in using the buffer in > userspace). > > - Reduced lock contention, as only a single thread, the device driver > ithread, > is acquiring the bpf descriptor's lock, and it's no longer contending > with > the user thread. > > One interesting, and in retrospect reasonable, side effect is that user > CPU > time goes up in the SMP scenario, as cache misses on the BPF buffer move > from > the read() system call to userspace. And, as you observe, you have to > use > somewhat larger buffer sizes, as in the previous scenario there were > three > buffers: two kernel buffers and a user buffer, and now there are simply > two > kernel buffers shared directly with user space. > > The original committed version has a problem in that it allows only one > kernel > buffer to be "owned" by userspace at a time, which can lead to excess > calls to > select(); this has now been corrected, so if people have run performance > benchmarks, they should update to the new code and re-run them. > > I don't have numbers off-hand, but 5%-25% were numbers that appeared in > some > of the measurements, and I'd like to think that the recent fix will > further > improve that. Out of curiosity, were those numbers for single cpu/core systems or systems with more than one cpu/core active/available? I know the testing I did was all single threaded, so moving time from kernel to user couldn't be expected to make a large overall difference in a non-SMP kernel (NetBSD-something at the time.) Darren -- Darren Reed darrenr_at_fastmail.netReceived on Wed Apr 23 2008 - 08:02:38 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:30 UTC