On Thu, Jun 21, 2007 at 06:47:37PM +0200, Sameh Ghane wrote: > Le (On) Thu, Jun 21, 2007 at 09:07:43AM -0700, Steve Kargl ecrivit (wrote): > > > > Jun 20 23:22:33 node10 kernel: TCP: [10.208.78.111]:54801 to > > [10.208.78.111]:49376 tcpflags 0x10<ACK>; syncache_expand: Segment failed > > SYNCOOKIE authentication, segment rejected (probably spoofed) > > How does a local communication get affected by your NIC's behavior !? It is an application that uses the Message Passing Interface. There are 4 processes running on node16 and 4 processes on node10. All processes are communicating with each other, when the link goes down/up the processes stop talking. The processes on node10 are trying to send/receive data from the now non-existent processes on node16. I'm assuming that communication between the processes on node10 gets out of sync and the above message appears. > > You seem to use Jumbo frames, maybe the link loss is switch related ? Same problem with jumbo frames are good old mtu 1500 frames. > > > So, I plan to replace all of the bge devices with a reliable, > > robust GigE NIC. Anyone have a suggestion for such a cards? > > I would go for em(4) because the driver works really fine, for > quite some time. How does em(4) compare to msk(4)? > Polling support is really good, and helps reducing interrupts. Tried that. Too much latencies. Too many dropped packets. The execution time of the app is doubled if not triple. Thanks for the info. I'll investigate the em(4). -- SteveReceived on Thu Jun 21 2007 - 15:02:46 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:13 UTC