Re: re(4) driver dropping packets when reading NFS files

From: Pyun YongHyeon <pyunyh_at_gmail.com> Date: Sun, 7 Nov 2010 16:26:01 -0800 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:08 UTC

On Sun, Nov 07, 2010 at 07:06:44PM -0500, Rick Macklem wrote:
> > 
> > I highly doubt it could be hardware issue.
> > 
> Looks like the hardware guys may be off the hook. See below.
> > 
> > It's job of bus_dma(9) and I don't think barrier instructions would
> > be helpful here as I don't see out-of-order execution in RX
> > handler.
> > 
> My current hunch is that something that changed between June 7 and
> June 15 in head/sys has caused the chip to have difficulties doing
> DMA, resulting in the Fifo overflows and approx. 10% "missed frames".
> 
> > 
> > Let's kill driver bug. No one reported this kind of issue so far
> > and I guess most users took it granted for the poor performance
> > because they are using low end consumer grade controller.
> >
> I think your driver is off the hook, too.
> 
> > 
> > > re0 statistics:
> > > Transmit good frames : 101346
> > > Receive good frames : 133390
> > > Tx errors : 0
> > > Rx errors : 0
> > > Rx missed frames : 14394
> > > Rx frame alignment errs : 0
> > > Tx single collisions : 0
> > > Tx multiple collisions : 0
> > > Rx unicast frames : 133378
> > > Rx broadcast frames : 0
> > > Rx multicast frames : 12
> > > Tx aborts : 0
> > > Tx underruns : 0
> > > rxe did 0: 14359
> > 
> Seeing that someone thought it had worked ok a while back, I decided to
> try some old kernels I had lying about from head/-current. I found that
> the one I svn`d on June 7 works well (about 7Mbytes per sec read rate) whereas one
> svn`d on June 15 had the problem (about 500Kbytes per sec read rate).
> 
> So what is different between these kernels:
> - if_re.c is identical
> - subr_dma.c has a simple change and porting the June 7 one over didn`t make
>   the June 15 one work better
> - amd64`s busdma_machdep.c is identical
> 
> so it must be something else. There are a bunch of changes to amd64`s pmap.c,
> which is why I`ve cc`d Alan, in case he might know if those changes could affect
> PCIe DMA or similar.
> 
> Other than that, maybe someone else familiar with the PCIe DMA could look and see
> if a change done to head between June 7 and 15 might explain it. (and it could
> be something else, a DMA problem for the chip is just a guess)
> 

If that made difference, all other ethernet controllers would have
suffered from the similar issues.

> rick
> ps: Unfortunately I`ll be on the road for the next month, so I won`t be able
>     to test patches until early Dec.

If you have some spare time please try attach one. I guess fast
ethernet controller has smaller FIFO size than that of GigE
controller so it is frequently triggered the issue on fast ethernet
controller than GigE controllers. I still guess that there are
cases that an interrupt is not correctly served such that driver
missed a lot of frames.