I need advice hunting down a network problem which I suspect to be a bug in the vge(4) driver. After spending a lot of time on investigation, I'm out of ideas My recently built new home server running FreeBSD 8.0-CURRENT as of 2009-06-07 on a VIA ARTiGO A2000 [1] exhibits network problems when sending more than a couple of dozened kilobytes of TCP traffic. The server application is "Dovecot" [2] Secure IMAP server. The client application is "Thunderbird" [3] running on WindowsXP. The high-level view of the problem is that the client seems to stall downloading messages or even a complex structure of IMAP folder names. When using STARTTLS the client often prints the infamous generic and misleading error "Thunderbird received a message with incorrect Message Authentication Code. If the error occurs frequently, contact the website administrator". The origin of this message is the SSL library that ships with Thunderbird. The same library is used for Firefox where the hint might actually make sense when the user is attempting to access a broken HTTPS server. After lots of debugging I found out that the same error is not only printed for TLS/SSL issues but simply also for broken TCP streams, let it be wrong TCP checksums or a server process dumping core. So I tried IMAP without TLS just to see the same issue with the misleading SSL error replaced by an application hang. I ran truss(1) against Dovecot, placed Thunderbird in debug mode [4] and found out that during a stall condition the server did write(2) all the data to the TCP socket but some data did not arrive at the client. The low-level view of the problem is that Wireshark on the client side sooner or later - not for the first few dozened packets - sees a packet with an incorrect TCP checksum. Usually the next packet is from the server again, continuing the stream. What follows is an expected but fruitless attempt of the client sending duplicate ACKs for the last good packet but the server incorrectly retransmitting more TCP packets with bad checksums. To me it sounds like a broken implementation of hardware generated checksums. Trying to disable all the "-tso" "-lro" "-txcsum" "-rxcsum" options and using "polling" option on the server side network interface did not help. So either something deeper is broken or maybe just the ability to disable these features needs fixing. Btw, the client using "VMware Accelerated AMD PCNet Adapter" driver with "TCP/IP Offload=off" and "TsoEnable=0". Sorry to bother you with more details but here's why I believe it's an hardware/driver issue. Before I purchased the hardware I tried a dry run. Installed FreeBSD 7.1-RELEASE as VM guest, then upgraded to FreeBSD 8.0-CURRENT using FreeBSD Administration Toolkit [5]. Built OS and apps from source, loaded my data - worked! Used the same client that has problems with the real hardware today. Then used that VM as build host to create the NanoBSD [6] Flash image for the ARTiGO. Both use exactly the same sources. The VM works, the metal is broken. One of the few differences is the NIC and it's driver. As a workaround I copied the VM to a usual PC equipped with a fxp(4) NIC - worked! So it really looks like an OS/HW compatibility issue on the ARTiGO. In case you are considering a hardware defect please note that before I loaded the OS, apps and my data to this new hardware I thoroughly tested what I could. One week filling the disks to the max using repetitive copies of a file created from /dev/random and, after manually breaking and rebuilding ZFS mirror, checking data integrity using message digests. No problems with disks, albeit poor SATA performance, but that's another story. One day running memtest86 [7]. No problems with memory. One hour NIC test copying /dev/zero to /dev/null over the wire using "scp -o compression=no". No hangs or hiccups here. Hope you can help me. **** manually trimmed/shaped server details **** # uname -a FreeBSD [...] 8.0-CURRENT FreeBSD 8.0-CURRENT #0: Sun Jun 7 13:09:44 CEST 2009 root_at_[...]:/usr/obj/nanobsd/usr/src/sys/VIAARTIGOA2000 i386 # dmesg CPU: VIA C7-D Processor 1500MHz (1499.85-MHz 686-class CPU) Origin = "CentaurHauls" Id = 0x6d0 Stepping = 0 Features=0xa7c9bbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,CMOV,PAT,CLFLUSH,ACPI,MMX,FXSR,SSE,SSE2,TM,PBE> Features2=0x4001<SSE3,xTPR> VIA Padlock Features=0xffcc<RNG,AES,AES-CTR,SHA1,SHA256,RSA> real memory = 2147483648 (2048 MB) avail memory = 2031333376 (1937 MB) ACPI APIC Table: <VX800 AWRDACPI> ioapic0 <Version 0.3> irqs 0-23 on motherboard ioapic1 <Version 0.3> irqs 24-47 on motherboard acpi0: <VX800 AWRDACPI> on motherboard pci0: <ACPI PCI bus> on pcib0 vgapci0: <VGA-compatible display> mem 0xd8000000-0xdbffffff,0xde000000-0xdeffffff,0xc0000000-0xcfffffff at device 1.0 on pci0 pcib1: <ACPI PCI-PCI bridge> irq 27 at device 2.0 on pci0 pci1: <ACPI PCI bus> on pcib1 pcib2: <ACPI PCI-PCI bridge> irq 31 at device 3.0 on pci0 pci2: <ACPI PCI bus> on pcib2 vge0: <VIA Networking Gigabit Ethernet> port 0xec00-0xecff mem 0xdf7ff000-0xdf7ff0ff irq 28 at device 0.0 on pci2 miibus0: <MII bus> on vge0 ip1000phy0: <IC Plus IP1001 10/100/1000 media interface> PHY 22 on miibus0 ip1000phy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto vge0: WARNING: using obsoleted if_watchdog interface vge0: Ethernet address: 00:40:63:xx:xx:xx # after boot # ifconfig vge0 vge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=1b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING> ether 00:40:63:xx:xx:xx inet [...] media: Ethernet autoselect (1000baseT <full-duplex,flag0,flag1,flag2>) status: active # after adding options "-tso" "-lro" "-txcsum" "-rxcsum" "polling" and trying after each one the final result is # ifconfig vge0 vge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=18<VLAN_MTU,VLAN_HWTAGGING> ether 00:40:63:xx:xx:xx inet [...] media: Ethernet autoselect (1000baseT <full-duplex,flag0,flag1,flag2>) status: active # pciconf -lbv vge0_at_pci0:2:0:0: class=0x020000 card=0x01101106 chip=0x31191106 rev=0x82 hdr=0x00 vendor = 'VIA Technologies Inc' device = ''Velocity' Gigabit Ethernet Controllers (VT6120/VT6121/VT6122)' class = network subclass = ethernet bar [10] = type I/O Port, range 32, base 0xec00, size 256, enabled bar [14] = type Memory, range 64, base 0xdf7ff000, size 256, enabled # vmstat -i interrupt total rate irq28: vge0 328436 23 **** references **** [1] VIA ARTiGO A2000 is a storage-oriented compact barebone PC -> http://www.via.com.tw/en/products/embedded/artigo/a2000/ [2] Dovecot Secure IMAP server, version 1.1.15 -> http://www.dovecot.org/ [3] Mozilla's Thunderbird email application, version 2.0.0.21 (20090302) -> http://www.mozillamessaging.com/en-US/thunderbird/ [4] run Thunderbird in debug mode set NSPR_LOG_MODULES=IMAP:5 set NSPR_LOG_FILE=C:\thunderbird.txt start /d "C:\Program Files\Mozilla Thunderbird\" thunderbird.exe -> http://wiki.Dovecot.org/Debugging/Thunderbird [5] Convenient FreeBSD Administration Toolkit -> http://people.freebsd.org/~rse/adm/ [6] NanoBSD Howto -> http://www.freebsd.org/doc/en_US.ISO8859-1/articles/nanobsd/ [7] Memory Diagnostic -> http://www.memtest86.com/memtest86-3.5.iso.zip **** related **** No 1000baseTX on VIA Artigo A2000 -> http://apps.sourceforge.net/phpbb/freenas/viewtopic.php?f=9&t=851 kern/130846: [vge] vge0 not autonegotiating to 1000baseTX full duplex in 7.1 -> http://www.freebsd.org/cgi/query-pr.cgi?pr=130846 FreeNAS on the ARTiGO A2000 -> http://www.logicsupply.com/blog/2008/12/29/freenas-on-the-artigo-a2000/ -- http://thomas.lotterer.netReceived on Tue Jun 09 2009 - 23:39:55 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:49 UTC