Alexander V. Chernikov wrote this message on Tue, Jun 10, 2014 at 13:17 +0400: > On 10.06.2014 07:03, Bryan Venteicher wrote: > >Hi, > > > >----- Original Message ----- > >>So, after finding out that nc has a stupidly small buffer size (2k > >>even though there is space for 16k), I was still not getting as good > >>as performance using nc between machines, so I decided to generate some > >>flame graphs to try to identify issues... (Thanks to who included a > >>full set of modules, including dtraceall on memstick!) > >> > >>So, the first one is: > >>https://www.funkthat.com/~jmg/em.stack.svg > >> > >>As I was browsing around, the em_handle_que was consuming quite a bit > >>of cpu usage for only doing ~50MB/sec over gige.. Running top -SH shows > >>me that the taskqueue for em was consuming about 50% cpu... Also pretty > >>high for only 50MB/sec... Looking closer, you'll see that bpf_mtap is > >>consuming ~3.18% (under ether_nh_input).. I know I'm not running tcpdump > >>or anything, but I think dhclient uses bpf to be able to inject packets > >>and listen in on them, so I kill off dhclient, and instantly, the > >>taskqueue > >>thread for em drops down to 40% CPU... (transfer rate only marginally > >>improves, if it does) > >> > >>I decide to run another flame graph w/o dhclient running: > >>https://www.funkthat.com/~jmg/em.stack.nodhclient.svg > >> > >>and now _rxeof drops from 17.22% to 11.94%, pretty significant... > >> > >>So, if you care about performance, don't run dhclient... > >> > >Yes, I've noticed the same issue. It can absolutely kill performance > >in a VM guest. It is much more pronounced on only some of my systems, > >and I hadn't tracked it down yet. I wonder if this is fallout from > >the callout work, or if there was some bpf change. > > > >I've been using the kludgey workaround patch below. > Hm, pretty interesting. > dhclient should setup proper filter (and it looks like it does so: > 13:10 [0] m_at_ptichko s netstat -B > Pid Netif Flags Recv Drop Match Sblen Hblen Command > 1224 em0 -ifs--l 41225922 0 11 0 0 dhclient > ) > see "match" count. > And BPF itself adds the cost of read rwlock (+ bgp_filter() calls for > each consumer on interface). > It should not introduce significant performance penalties. Don't forget that it has to process the returning ack's... So, you're looking around 10k+ pps that you have to handle and pass through the filter... That's a lot of packets to process... Just for a bit more "double check", instead of using the HD as a source, I used /dev/zero... I ran a netstat -w 1 -I em0 when running the test, and I was getting ~50.7MiB/s w/ dhclient running and then I killed dhclient and it instantly jumped up to ~57.1MiB/s.. So I launched dhclient again, and it dropped back to ~50MiB/s... and some of this slowness is due to nc using small buffers which I will fix shortly.. And with witness disabled it goes from 58MiB/s to 65.7MiB/s.. In both cases, that's a 13% performance improvement by running w/o dhclient... This is using the latest memstick image, r266655 on a (Lenovo T61): FreeBSD 11.0-CURRENT #0 r266655: Sun May 25 18:55:02 UTC 2014 root_at_grind.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64 FreeBSD clang version 3.4.1 (tags/RELEASE_34/dot1-final 208032) 20140512 WARNING: WITNESS option enabled, expect reduced performance. CPU: Intel(R) Core(TM)2 Duo CPU T7300 _at_ 2.00GHz (1995.05-MHz K8-class CPU) Origin="GenuineIntel" Id=0x6fb Family=0x6 Model=0xf Stepping=11 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Features2=0xe3bd<SSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM> AMD Features=0x20100800<SYSCALL,NX,LM> AMD Features2=0x1<LAHF> TSC: P-state invariant, performance statistics real memory = 2147483648 (2048 MB) avail memory = 2014019584 (1920 MB) -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not."Received on Tue Jun 10 2014 - 14:24:51 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:49 UTC