Re: excessive TCP dulplicate acks revisted

From: Andre Oppermann <andre_at_freebsd.org>
Date: Sat, 10 Nov 2007 16:28:58 +0100
Gregory Wright wrote:
> (Note: long message)
> 
> Hi,
> 
> The tcp duplicate ACK attack is back.
> 
> Last March, there was a thread on duplicate TCP acks in -CURRENT.
> I have been able to reproduce the problem on 7.0-BETA2 (amd64)
> and have some new information that might help locate the bug.
> 
> Background:  I first noticed problems with tcp connections dropping
> when Bacula was running on our backup server.  The hardware was
> a dual Opteron 244, 2 GB RAM, running FreeBSD 6.2-RELEASE-p2.
> The ethernet NIC was a bge.
> 
> We went through an extensive process to rule out hardware and cabling
> problems:  the server hardware was replaced with a single Opteron 270
> (dual core) on a Tyan S2882-D motherboard.  The memory was replaced
> as well.  The NICs are still bge.  This machine is "hardtack".

This may be a TSO bug in the bge hardware.  Please repeat your tests in
the original setup with TSO disabled on the bge interfaces ("ifconfig bgeX -tso").
Second if you've got another network card with something else than bge
please put it into the same box and run the tests as well.

-- 
Andre

> We have another server in the same rack, a dual Opteron 2214, 4 GB RAM,
> running FreeBSD 6.2-RELEASE-p1.  It has never shown the "dropped
> connection" problem.  This machine is "greenhouse-george"
> 
> All of the machines are on a single LAN, with a Nortel BayStack 450-24T
> ethernet switch.
> 
> Experiments:
> 
> The first experiment was to use netperf to send a tcp stream from either
> a PowerBook G4 (OS X 10.4.10) or a dual Intel Clovertown box running
> FreeBSD 6.2-RELEASE-p8.  This is from the PowerBook:
> 
> crossroads-able> netperf -H greenhouse-george
> TCP STREAM TEST from localhost (0.0.0.0) port 0 AF_INET to greenhouse- 
> george.18clay.com (192.168.2.63) port 0 AF_INET
> Recv   Send    Send
> Socket Socket  Message  Elapsed
> Size   Size    Size     Time     Throughput
> bytes  bytes   bytes    secs.    10^6bits/sec
> 
> 65536 262144 262144    10.02      93.77
> crossroads-able> netperf -H hardtack
> TCP STREAM TEST from localhost (0.0.0.0) port 0 AF_INET to hardtack. 
> 18clay.com (192.168.2.61) port 0 AF_INET
> Recv   Send    Send
> Socket Socket  Message  Elapsed
> Size   Size    Size     Time     Throughput
> bytes  bytes   bytes    secs.    10^6bits/sec
> 
> 65536 262144 262144    10.10      29.24
> crossroads-able>
> 
> (That there is a problem in the second case is easy to see just
> from the ethernet switch: the activity lights blink on and off.  A  
> wireshark
> trace shows a duplicate ACK storm and gaps in the transmitted
> packet stream.)
> 
> Just to be certain that this isn't a weird cabling problem, I swapped  the
> cables to the two machines:
> 
> crossroads-able> netperf -H greenhouse-george
> TCP STREAM TEST from localhost (0.0.0.0) port 0 AF_INET to greenhouse- 
> george.18clay.com (192.168.2.63) port 0 AF_INET
> Recv   Send    Send
> Socket Socket  Message  Elapsed
> Size   Size    Size     Time     Throughput
> bytes  bytes   bytes    secs.    10^6bits/sec
> 
> 65536 262144 262144    10.02      92.68
> crossroads-able> netperf -H hardtack
> TCP STREAM TEST from localhost (0.0.0.0) port 0 AF_INET to hardtack. 
> 18clay.com (192.168.2.61) port 0 AF_INET
> Recv   Send    Send
> Socket Socket  Message  Elapsed
> Size   Size    Size     Time     Throughput
> bytes  bytes   bytes    secs.    10^6bits/sec
> 
> 65536 262144 262144    10.02      30.86
> crossroads-able>
> 
> The low throughput (and gaps in the tcp stream) stay with the
> machine rather than moving with the cable.
> 
> When I use my dual Clovertown box (FreeBSD 6.2-RELEASE-p8,
> em NIC), as a source, I get
> 
> ivy-mike# netperf -H greenhouse-george
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to greenhouse- 
> george.18clay.com (192.168.2.63) port 0 AF_INET
> Recv   Send    Send
> Socket Socket  Message  Elapsed
> Size   Size    Size     Time     Throughput
> bytes  bytes   bytes    secs.    10^6bits/sec
> 
> 65536  32768  32768    10.58      94.14
> ivy-mike# netperf -H hardtack
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to hardtack. 
> 18clay.com (192.168.2.61) port 0 AF_INET
> Recv   Send    Send
> Socket Socket  Message  Elapsed
> Size   Size    Size     Time     Throughput
> bytes  bytes   bytes    secs.    10^6bits/sec
> 
> 65536  32768  32768    10.57      89.15
> ivy-mike#
> 
> 
> FreeBSD seems not as disturbed by the extra ACKs as OS X,
> but I still see the activity lights on the switch blinking, indicating
> dropouts in the tcp stream.  Also, longer tests using either
> source machine almost always result in failure ("Broken Pipe")
> after a few minutes.
> 
> Now is where things get interesting.  If I connect the PowerBook
> to hardtack (the machine showing the bug) with just a cable,
> bypassing the switch, I get
> 
> crossroads-able> netperf -H 192.168.2.2 -l 300
> TCP STREAM TEST from (null) (0.0.0.0) port 0 AF_INET to (null)  
> (192.168.2.2) port 0 AF_INET
> Recv   Send    Send
> Socket Socket  Message  Elapsed
> Size   Size    Size     Time     Throughput
> bytes  bytes   bytes    secs.    10^6bits/sec
> 
> 65536 262144 262144    300.54    330.45
> crossroads-able>
> 
> The higher throughput is because I have two GbE interfaces
> directly connected, instead of going through the 10/100 Mb
> switch.  The key thing is that I can run this test for hours without
> the connection dropping.  (I have run it for as long as 10000 seconds.)
> And the wireshark log shows no duplicate ACK storm.
> 
> Is there perhaps something wrong with the switch?  I swapped
> out our BayStack 450-24T for an identical unit with the same
> results.  I also have an unmanaged SMC EZ Switch 10/100, so
> I tried a minimal setup:
> 
> PowerBook <---->  SMC switch <----> FreeBSD 7.0-BETA2 server (hardtack)
> 
> Only the PowerBook and FreeBSD server are connected
> to the switch.  A netperf tcp streaming test gives:
> 
> crossroads-able> netperf -H 192.168.2.2
> TCP STREAM TEST from (null) (0.0.0.0) port 0 AF_INET to (null)  
> (192.168.2.2) port 0 AF_INET
> Recv   Send    Send
> Socket Socket  Message  Elapsed
> Size   Size    Size     Time     Throughput
> bytes  bytes   bytes    secs.    10^6bits/sec
> 
> 65536 262144 262144    10.02      24.25
> crossroads-able>
> 
> The throughput is low, but most importantly, the wireshark
> log shows that the duplicate ACK storm is back.
> 
> So is there something coming _out_ of the switch that FreeBSD's network
> stack doesn't like?  No, this doesn't seem to be the case.  The old
> SMC switch doesn't even send out spanning tree packets.
> 
> Next experiment: netperf tcp stream from the PowerBook, but
> with the interface forced into 100TX full duplex.  Result:
> crossroads-able> netperf -H 192.168.2.2
> 
> TCP STREAM TEST from (null) (0.0.0.0) port 0 AF_INET to (null)  
> (192.168.2.2) port 0 AF_INET
> Recv   Send    Send
> Socket Socket  Message  Elapsed
> Size   Size    Size     Time     Throughput
> bytes  bytes   bytes    secs.    10^6bits/sec
> 
> 65536 262144 262144    10.02      33.81
> crossroads-able>
> 
> So the throughput has dropped and the wireshark log shows
> duplicate ACK storms again.
> 
> If I force the PowerBook to 100TX half duplex the wireshark log
> still shows duplicate ACK storms. And at 10 Mb/s, full or half duplex,
> I see still bursts of duplicate ACKs.
> 
> One last thing to note is that I can run netperf tests _from_ the
> buggy machine to the laptop with no trouble.  Again forcing the
> PowerBook interface to 100TX full duplex,
> 
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.2.1  
> (192.168.2.1) port 0 AF_INET
> Recv   Send    Send
> Socket Socket  Message  Elapsed
> Size   Size    Size     Time     Throughput
> bytes  bytes   bytes    secs.    10^6bits/sec
> 
> 262144  32768  32768    10.02      93.03
> hardtack#
> 
> The wireshark logs show no duplicate ACKs.
> 
> One interesting thing is that the packet logs for the reverse
> (properly working) case show a more regular structure of
> data and ACK packets.  In the forward (buggy) direction,
> the data packets are burstier, and to my eye more irregular.
> This was mentioned in passing in the original thread, in which
> someone speculated that there might be a window problem.
> 
> So, in sum:
> 
> The duplicate ack problem happens on my FreeBSD 7.0-BETA2
> machine (single Opteron 270, Tyan S2882-D, 2 GB RAM) when the
> bge interface in operating in 10 or 100 Mb/s mode, but not in
> Gb mode.
> 
> It does not happen on my FreeBSD 6.2-RELEASE-p1 box
> (dual Opteron 2214, Tyan 3992G3NR, 4 GB RAM) which
> has a bge NIC as well.  Nor does it happen on my dual
> Clovertown box (dual E5345, Tyan S2696A2NRF, 4 GB)
> with em NIC.
> 
> The buggy machine is not much use to me until this problem can be
> fixed, so I can dedicate it to debugging for a while.  Any hints
> about where code might be instrumented to track the bug down or
> new experiments to perform are welcome.
> 
> I can provide wireshark logs; compressed they are not too
> large (a few MB).
> 
> Here is the dmesg of the buggy machine:
> 
> hardtack# dmesg
> Copyright (c) 1992-2007 The FreeBSD Project.
> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
>         The Regents of the University of California. All rights  reserved.
> FreeBSD is a registered trademark of The FreeBSD Foundation.
> FreeBSD 7.0-BETA2 #0: Fri Nov  2 14:54:38 UTC 2007
>     root_at_myers.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC
> Timecounter "i8254" frequency 1193182 Hz quality 0
> CPU: Dual Core AMD Opteron(tm) Processor 270 HE (1993.41-MHz K8-class  CPU)
>   Origin = "AuthenticAMD"  Id = 0x20f12  Stepping = 2
>    
> Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE 
> ,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
>   Features2=0x1<SSE3>
>   AMD Features=0xe2500800<SYSCALL,NX,MMX+,FFXSR,LM,3DNow!+,3DNow!>
>   AMD Features2=0x3<LAHF,CMP>
>   Cores per package: 2
> usable memory = 2134962176 (2036 MB)
> avail memory  = 2060251136 (1964 MB)
> ACPI APIC Table: <A M I  OEMAPIC >
> FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
> cpu0 (BSP): APIC ID:  0
> cpu1 (AP): APIC ID:  1
> MADT: Forcing active-low polarity and level trigger for SCI
> ioapic0 <Version 1.1> irqs 0-23 on motherboard
> ioapic1 <Version 1.1> irqs 24-27 on motherboard
> ioapic2 <Version 1.1> irqs 28-31 on motherboard
> kbd1 at kbdmux0
> ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413,  RF5413)
> acpi0: <A M I OEMXSDT> on motherboard
> acpi0: [ITHREAD]
> acpi0: Power Button (fixed)
> acpi0: reservation of 0, a0000 (3) failed
> acpi0: reservation of 100000, 7ff00000 (3) failed
> Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
> acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0
> acpi_hpet0: <High Precision Event Timer> iomem 0xfec01000-0xfec013ff  on 
> acpi0
> Timecounter "HPET" frequency 14318180 Hz quality 900
> cpu0: <ACPI CPU> on acpi0
> acpi_throttle0: <ACPI CPU Throttling> on cpu0
> powernow0: <Cool`n'Quiet K8> on cpu0
> cpu1: <ACPI CPU> on acpi0
> powernow1: <Cool`n'Quiet K8> on cpu1
> pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
> pci0: <ACPI PCI bus> on pcib0
> pcib1: <ACPI PCI-PCI bridge> at device 6.0 on pci0
> pci3: <ACPI PCI bus> on pcib1
> ohci0: <OHCI (generic) USB controller> mem 0xfeafb000-0xfeafbfff irq  19 
> at device 0.0 on pci3
> ohci0: [GIANT-LOCKED]
> ohci0: [ITHREAD]
> usb0: OHCI version 1.0, legacy support
> usb0: <OHCI (generic) USB controller> on ohci0
> usb0: USB revision 1.0
> uhub0: <AMD OHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb0
> uhub0: 3 ports with 3 removable, self powered
> ohci1: <OHCI (generic) USB controller> mem 0xfeafc000-0xfeafcfff irq  19 
> at device 0.1 on pci3
> ohci1: [GIANT-LOCKED]
> ohci1: [ITHREAD]
> usb1: OHCI version 1.0, legacy support
> usb1: <OHCI (generic) USB controller> on ohci1
> usb1: USB revision 1.0
> uhub1: <AMD OHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb1
> uhub1: 3 ports with 3 removable, self powered
> atapci0: <SiI 3114 SATA150 controller> port  
> 0xbc00-0xbc07,0xb880-0xb883,0xb800-0xb807,0xac00-0xac03,0xa880-0xa88f  
> mem 0xfeafec00-0xfeafefff irq 19 at device 5.0 on pci3
> atapci0: [ITHREAD]
> ata2: <ATA channel 0> on atapci0
> ata2: [ITHREAD]
> ata3: <ATA channel 1> on atapci0
> ata3: [ITHREAD]
> ata4: <ATA channel 2> on atapci0
> ata4: [ITHREAD]
> ata5: <ATA channel 3> on atapci0
> ata5: [ITHREAD]
> vgapci0: <VGA-compatible display> port 0xb000-0xb0ff mem  
> 0xfd000000-0xfdffffff,0xfeaff000-0xfeafffff irq 18 at device 6.0 on pci3
> fxp0: <Intel 82551 Pro/100 Ethernet> port 0xa800-0xa83f mem  
> 0xfeafa000-0xfeafafff,0xfeaa0000-0xfeabffff irq 18 at device 8.0 on pci3
> miibus0: <MII bus> on fxp0
> inphy0: <i82555 10/100 media interface> PHY 1 on miibus0
> inphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
> fxp0: Ethernet address: 00:e0:81:4b:3e:39
> fxp0: [ITHREAD]
> isab0: <PCI-ISA bridge> at device 7.0 on pci0
> isa0: <ISA bus> on isab0
> atapci1: <AMD 8111 UDMA133 controller> port  
> 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 7.1 on pci0
> ata0: <ATA channel 0> on atapci1
> ata0: [ITHREAD]
> ata1: <ATA channel 1> on atapci1
> ata1: [ITHREAD]
> pci0: <serial bus, SMBus> at device 7.2 (no driver attached)
> pci0: <bridge> at device 7.3 (no driver attached)
> pcib2: <ACPI PCI-PCI bridge> at device 10.0 on pci0
> pci2: <ACPI PCI bus> on pcib2
> ahc0: <Adaptec 3960D Ultra160 SCSI adapter> port 0x8400-0x84ff mem  
> 0xfc8fd000-0xfc8fdfff irq 27 at device 3.0 on pci2
> ahc0: [ITHREAD]
> aic7899: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs
> ahc1: <Adaptec 3960D Ultra160 SCSI adapter> port 0x8800-0x88ff mem  
> 0xfc8fe000-0xfc8fefff irq 24 at device 3.1 on pci2
> ahc1: [ITHREAD]
> aic7899: Ultra160 Wide Channel B, SCSI Id=7, 32/253 SCBs
> pci0:2:9:0: bad VPD cksum, remain 72
> bge0: <Broadcom Gigabit Ethernet Controller, ASIC rev. 0x2100> mem  
> 0xfc870000-0xfc87ffff irq 24 at device 9.0 on pci2
> miibus1: <MII bus> on bge0
> brgphy0: <BCM5704 10/100/1000baseTX PHY> PHY 1 on miibus1
> brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT,  
> 1000baseT-FDX, auto
> bge0: Ethernet address: 00:e0:81:4b:3e:ae
> bge0: [ITHREAD]
> pci0:2:9:1: bad VPD cksum, remain 72
> bge1: <Broadcom Gigabit Ethernet Controller, ASIC rev. 0x2100> mem  
> 0xfc890000-0xfc89ffff irq 25 at device 9.1 on pci2
> miibus2: <MII bus> on bge1
> brgphy1: <BCM5704 10/100/1000baseTX PHY> PHY 1 on miibus2
> brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT,  
> 1000baseT-FDX, auto
> bge1: Ethernet address: 00:e0:81:4b:3e:af
> bge1: [ITHREAD]
> pcib3: <ACPI PCI-PCI bridge> at device 11.0 on pci0
> pci1: <ACPI PCI bus> on pcib3
> acpi_button0: <Power Button> on acpi0
> atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
> atkbd0: <AT Keyboard> irq 1 on atkbdc0
> kbd0 at atkbd0
> atkbd0: [GIANT-LOCKED]
> atkbd0: [ITHREAD]
> psm0: <PS/2 Mouse> irq 12 on atkbdc0
> psm0: [GIANT-LOCKED]
> psm0: [ITHREAD]
> psm0: model IntelliMouse, device ID 3
> sio0: configured irq 4 not in bitmap of probed irqs 0
> sio0: port may not be enabled
> sio0: configured irq 4 not in bitmap of probed irqs 0
> sio0: port may not be enabled
> sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10  on 
> acpi0
> sio0: type 16550A
> sio0: [FILTER]
> sio1: configured irq 3 not in bitmap of probed irqs 0
> sio1: port may not be enabled
> sio1: configured irq 3 not in bitmap of probed irqs 0
> sio1: port may not be enabled
> sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0
> sio1: type 16550A
> sio1: [FILTER]
> fdc0: <floppy drive controller (FDE)> port 0x3f0-0x3f5,0x3f7 irq 6  drq 
> 2 on acpi0
> fdc0: [FILTER]
> fd0: <1440-KB 3.5" drive> on fdc0 drive 0
> orm0: <ISA Option ROMs> at iomem 0xc0000-0xc7fff,0xc8000-0xcc7ff on isa0
> ppc0: cannot reserve I/O port range
> sc0: <System console> at flags 0x100 on isa0
> sc0: VGA <16 virtual consoles, flags=0x300>
> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on  isa0
> Timecounters tick every 1.000 msec
> acd0: DVDROM <ATAPI DVD D DH16D2P/HP56> at ata1-master UDMA33
> ad4: 305245MB <Seagate ST3320620AS 3.AAE> at ata2-master SATA150
> Waiting 5 seconds for SCSI devices to settle
> sa0 at ahc1 bus 0 target 6 lun 0
> sa0: <IBM ULTRIUM-TD2 3AYD> Removable Sequential Access SCSI-3 device
> sa0: 160.000MB/s transfers (80.000MHz DT, offset 31, 16bit)
> SMP: AP CPU #1 Launched!
> Trying to mount root from ufs:/dev/ad4s1a
> WARNING: / was not properly dismounted
> ch0 at ahc1 bus 0 target 5 lun 0
> ch0: <DELL PV-122T K17r> Removable Changer SCSI-2 device
> ch0: 3.300MB/s transfers
> ch0: 8 slots, 1 drive, 1 picker, 0 portals
> bge1: link state changed to DOWN
> bge1: link state changed to UP
> bge1: promiscuous mode enabled
> bge1: promiscuous mode disabled
> bge1: link state changed to DOWN
> bge1: link state changed to UP
> bge1: link state changed to DOWN
> bge1: link state changed to UP
> bge1: link state changed to DOWN
> bge1: link state changed to UP
> bge1: link state changed to DOWN
> bge1: link state changed to UP
> bge1: link state changed to DOWN
> bge1: link state changed to UP
> bge1: link state changed to DOWN
> bge1: link state changed to UP
> hardtack#
> 
> 
> Please let me know what I can do to track this down.
> 
> Best Wishes,
> Greg
> 
> Gregory Wright
> Antiope Associates LLC
> 18 Clay Street
> Fair Haven, New Jersey 07704
> USA
> 
> 1 (732) 924-4549
> 1 (732) 345-8378 [fax]
> 
> 
> _______________________________________________
> freebsd-current_at_freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"
> 
> 
Received on Sat Nov 10 2007 - 14:55:54 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:21 UTC