I've been experiencing problems with the bge device for several weeks. In this time, I've tried tuning every imaginable parameter that I could find. There appear to be several related problems: node10:kargl[203] netstat -I bge1 Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll bge1 9000 <Link#2> 00:e0:81:40:48:93 81505160 238721 81933513 9 0 bge1 9000 192.168.0.0 node10 81504878 - 81933689 - - Notice the Ierrs value continuously grows with the MPI application I have runs. In /var/log/messages one finds: Jun 20 23:20:42 node10 kernel: bge1: watchdog timeout -- resetting Jun 20 23:20:42 node10 kernel: bge1: link state changed to DOWN Jun 20 23:20:46 node10 kernel: bge1: link state changed to UP This DOWN/UP breaks the MPI application and leads to several additional messeages of the form. Jun 20 23:22:33 node10 kernel: TCP: [10.208.78.111]:54801 to [10.208.78.111]:49376 tcpflags 0x10<ACK>; syncache_expand: Segment failed SYNCOOKIE authentication, segment rejected (probably spoofed) So, I plan to replace all of the bge devices with a reliable, robust GigE NIC. Anyone have a suggestion for such a cards? -- SteveReceived on Thu Jun 21 2007 - 14:09:07 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:13 UTC