Re: [PATCH] nve(4) locking cleanup

From: Matthew Dillon <dillon_at_apollo.backplane.com> Date: Thu, 17 Nov 2005 15:43:14 -0800 (PST) · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:47 UTC

:
:
:I have a pair of DFI Nforce-4 based "NF4 ultra" boards, where the
:FreeBSD driver will never pass any traffic at all (and never has).
:
:Matthew Dillon writes:
: >     At this point I believe that the remaining problems are entirely within
: >     Nvidia's nvnet object module.  I don't think there is anything we can do
: >     about it short of NVidia coming out with an update (which isn't likely).
:
:At least on my boards, the Solaris "nfo" driver from
:http://homepage2.nifty.com/mrym3/taiyodo/eng works flawlessly.
:The object file they use has the same checksum as the one used
:by the FreeBSD driver.
:
:Note that this is at 100Mb/s speeds, and is used for NFS (client),
:and ssh sessions only.  I haven't tried really hard to beat the
:snot out of it, but it has worked for months without me seeing
:a problem in daily use.

    All my testing and comments are at GiGE speeds.  I haven't actually
    tried 100BaseT speeds with all the most recent fixes in place.  I will
    do that to see if I get the same hardware death issue.

    p.s. The MII interface will support GigE speeds if you use the extra
    speed selector bit that a number of manufacturers are now using to extend
    the MII specification.

: >     Now, linux *has* a native implementation of this driver that does not
: >     use the Nvidia module, and I have gotten reports that it does not suffer
: >     from the same problems. 
:
:I'm working on a linux driver right now, and have enabled the the
:linux slab debugging stuff (similar to the type of malloc debugging we
:get with INVARIANTS).  At boot (before I even load my driver), the
:forcedeth driver from 2.6.13.1 will receive corrupted frames.  Anybody
:porting that driver should look out for buffer over/under flow issues
:in it...
:
:Drew

    I recall there being a comment in the linux source code regarding
    the corrupt frame issue.  The main issue for me with nvnet is this 
    hardware lockup I am getting that requires physically pulling the power
    cord to fix.  That reaks of a 'hardware bug' or 'BIOS misprogramming'
    issue.  A corrupt frame problem could be a latency or FIFO issue, or
    another hypertransport issue.

    I also suspect that the nvnet driver has an interrupt race somewhere
    in its blackboxed object module.  I've gone over the driver tooth and
    nail and can't come up with any reason why after successfully 
    transfering gigabytes and gigabytes of data it would suddenly stop
    generating transmit interrupts.  I'm gonna play with it a bit more...
    the only thing left that I haven't investigated is the transmit and
    receive ring management.

					-Matt
					Matthew Dillon 
					<dillon_at_backplane.com>