Re: 6-CURRENT Network stack issues w/SMP? (Was: Re: TreeListfailed: Network write failure: ChannelMux.ProtocolError)

From: Robert Watson <rwatson_at_freebsd.org>
Date: Sun, 12 Sep 2004 16:10:29 -0400 (EDT)
On Sun, 12 Sep 2004, Andre Guibert de Bruet wrote:

> Using an rl-based network card, I am able to transfer data without any
> problems. Any idea who the nge maintainer is? 

I'm not sure we have an nge maintainer, but I'm also not sure it's needed
much maintenance (perhaps until now).  Bill Paul wrote it, I believe,
however.  I'm thinking there are a couple of things we should try doing:

- First, we should confirm that Giant really is properly held in some
  strategic places in the driver.  I.e., slap down GIANT_REQUIRED in a
  bunch of interesting looking places (perhaps the head of most of the
  functions).  We could be entering the ioctl code w/o Giant, perhaps, or
  the watch dog.

- Attempt to identify whether or not the corruption corresponds with other
  failure modes that may be present, such as packet loss.  Perhaps we're
  looking at a problem with reassembly and/or retransmission.  It would be
  useful to know, for example, if the counters relating to TCP packet loss
  go up at about the time corruption occurs.

- We should probably build a test tool to characterize the corruption a
  bit better.  We could potentially start out just by dd'ing a big file of
  zeros through netcat between two hosts using if_nge, and confirm that
  the zeros get there in one piece, and then try with more complex data
  patterns that would reveal improper ordering, etc.

- For grins, could you try running the same software with TCP SACK turned
  off and confirm that the problem is still present?

Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
robert_at_fledge.watson.org      Principal Research Scientist, McAfee Research
Received on Sun Sep 12 2004 - 18:10:43 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:11 UTC