Re: re(4) driver dropping packets when reading NFS files

From: Pyun YongHyeon <pyunyh_at_gmail.com>
Date: Thu, 4 Nov 2010 19:31:53 -0700
On Thu, Nov 04, 2010 at 09:31:30PM -0400, Rick Macklem wrote:
> > 
> > If the counter was not wrapped, it seem you lost more than 10% out of
> > total RX frames. This is a lot loss and there should be a way to
> > mitigate it.
> > 
> I've attached a patch (to the if_re.c in head, not your patched variant)
> that works a lot better (about 5Mbytes/sec read rate). To get that, I
> had to disable msi and not clear the RL_IMR register in re_intr(). I
> suspect that a packet would be received between when the bits in RL_IMR
> were cleared and when they were set at the end of re_int_task() and those
> were getting lost.
> 
> This patch doesn't completely fix the problem. (I added your stats collecting
> stuff to the if_re.c in head and attached the result, which still shows some lost packets. One
> thought is clearing the bits in RL_ISR in re_intr() instead of re_int_task(),
> but then I can't see a good way to pass the old value of the status reg.
> through to re_int_task()?
> 

Hmm, I still don't understand how the patch mitigates the issue. :-(
The patch does not disable interrupts in interrupt handler so
taskqueue runs with interrupt enabled. This may ensure not loosing
interrupts but it may also generate many interrupts. By chance, did
you check number of interrupts generated with/without your patch?

The only guess I have at the moment is the writing RL_IMR in
interrupt handler at the end of taskqueue might be not immediately
reflected so controller can loose interrupts for the time window.
Would you try attach patch and let me know it makes any difference?

> The patch doesn't help when msi is enabled and when I played with your
> patched variant, I got it to hang when RL_IMR wasn't cleared.
> 
> I've attached the patch and stats.
> 
> I might play around with it some more tomorrow, rick

Thanks for your work.

> ps: If you have hardware to test re with, you want to do an NFS mount
>     and then read a large file when nothing else is happening on the
>     machine, to see if you can reproduce the problem.

It seems I'm not able to reproduce it on my box(8168B GigE).

> pss: All tests done with a kernel that does not have option DEVICE_POLLING.

Ok, I have to commit statistics counter patch since it seems to
help narrowing down driver issue.

> re0 statistics:
> Transmit good frames : 83320
> Receive good frames : 136158
> Tx errors : 0
> Rx errors : 0
> Rx missed frames : 2666
> Rx frame alignment errs : 0
> Tx single collisions : 0
> Tx multiple collisions : 0
> Rx unicast frames : 136157
> Rx broadcast frames : 0
> Rx multicast frames : 1
> Tx aborts : 0
> Tx underruns : 0


Received on Fri Nov 05 2010 - 01:32:01 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:08 UTC