The recent addition of TSO (TCP Segmentation Offload) has highlighted some shortcommings in our sosend_*() kernel implementation. The current code uses a sosend_copyin() function that loops over the supplied struct uio and does interleaved mbuf allocations and uiomove() calls. I have rewritten m_getm() to be simpler and to allocate PAGE_SIZE sized jumbo mbuf clusters (4k on most architectures) as well as m_uiotombuf() to use the new m_getm() to obtain all mbuf space in one go. It then loops over it an copies the data into the mbufs by using uiomove(). sosend_dgram() and sosend_generic() are change to use m_uiotombuf() instead of sosend_copyin(). Looking at the benchmarks we see some very nice improvements (95% confidence): 66% less cpu (or 2.9 times better) with new sosend vs. old sosend (non-TSO) 65% less cpu (or 2.8 times better) with new sosend vs. old sosend (TSO) The sender is an AMD Opteron 852 (2.6GHz) with em(4) PCI-X-133 interface and the receiver is a DELL Poweredge SC1425 P-IV Xeon 3.2GHz with em(4) LOM connected back to back at 1000Base-TX full duplex. The patch is available here: http://people.freebsd.org/~andre/sosend+m_uiotombuf-20060928.diff Any testing and heavy (code) beating and reviews welcome. -- Andre Here are the raw numbers (netperf at 95% confidence, +-2.5% error margin, the cpu load reported by netperf is different from the one reported by time(1), all performance references are made based on time(1) output, netperf 2.4.2 used): a) is old sosend kernel implementation b) is new sosend kernel implementation 1) time ./netperf -H192.168.2.2,4 -tTCP_STREAM -C -c -F 6.2-BETA1-i386-disc1.iso -- -s32K -S32K [non-TSO] 2) time ./netperf -H192.168.2.2,4 -tTCP_STREAM -C -c -F 6.2-BETA1-i386-disc1.iso -- -s32K -S32K [TSO] 3) time ./netperf -H192.168.2.2,4 -tTCP_STREAM -C -c -F 6.2-BETA1-i386-disc1.iso -- -s64K -S64K [non-TSO] 4) time ./netperf -H192.168.2.2,4 -tTCP_STREAM -C -c -F 6.2-BETA1-i386-disc1.iso -- -s64K -S64K [TSO] 5) time ./netperf -H192.168.2.2,4 -tTCP_STREAM -C -c -F 6.2-BETA1-i386-disc1.iso -- -s128K -S128K [non-TSO] 6) time ./netperf -H192.168.2.2,4 -tTCP_STREAM -C -c -F 6.2-BETA1-i386-disc1.iso -- -s128K -S128K [TSO] Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % C % C us/KB us/KB 1a) 32768 32768 32768 10.00 921.28 28.42 32.48 2.527 2.888 0.007u 1.747s 0:10.00 17.4% 99+5252k 0+0io 0pf+0w 1b) 32768 32768 32768 10.00 921.39 24.51 31.50 2.179 2.801 0.028u 0.768s 0:10.00 7.8% 78+4210k 0+0io 0pf+0w 2a) 32768 32768 32768 10.00 897.63 24.29 37.74 2.216 3.445 0.000u 1.359s 0:10.02 13.4% 96+5152k 5+0io 3pf+0w 2b) 32768 32768 32768 10.00 919.71 15.64 33.01 1.393 2.940 0.008u 0.528s 0:10.00 5.2% 90+4830k 0+0io 0pf+0w 3a) 65536 65536 65536 10.00 941.60 30.90 32.01 2.689 2.785 0.000u 1.827s 0:10.00 18.2% 96+5180k 0+0io 0pf+0w 3b) 65536 65536 65536 10.00 941.59 26.39 32.03 2.296 2.787 0.014u 0.617s 0:10.00 6.2% 101+5362k 0+0io 0pf+0w 4a) 65536 65536 65536 10.00 921.98 26.09 39.47 2.318 3.507 0.000u 1.467s 0:10.02 14.5% 93+5028k 3+0io 0pf+0w 4b) 65536 65536 65536 10.00 938.44 16.24 34.29 1.418 2.993 0.000u 0.511s 0:10.00 5.1% 91+4851k 0+0io 0pf+0w 5a) 131072 131072 131072 10.00 941.62 33.81 33.68 2.941 2.930 0.000u 2.158s 0:10.00 21.5% 97+5247k 0+0io 0pf+0w 5b) 131072 131072 131072 10.00 941.60 28.55 31.65 2.484 2.754 0.000u 0.676s 0:10.00 6.7% 95+5132k 0+0io 0pf+0w 6a) 131072 131072 131072 10.00 922.92 28.72 40.80 2.549 3.621 0.000u 1.713s 0:10.00 17.1% 93+5016k 1+0io 0pf+0w 6b) 131072 131072 131072 10.00 939.14 18.20 34.44 1.587 3.004 0.000u 0.587s 0:10.00 5.8% 78+4197k 1+0io 0pf+0wReceived on Thu Sep 28 2006 - 20:10:27 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:00 UTC