:Ok, now that the first set of locking overhaul is in the tree, can folks with :working nve(4) adapters test the patch referenced below and make sure there :are no regressions. Having the IFF_UP fiddling turned off may or may not :help folks getting the TX timeouts as well, btw, so if people are feeling :brave they can try this patch as well. Note it is only applicable to recent :current. : :http://www.FreeBSD.org/~jhb/patches/nve_locking.patch : :-- :John Baldwin <jhb_at_FreeBSD.org> <>< http://www.FreeBSD.org/~jhb/ :"Power Users Use the Power to Serve" = http://www.FreeBSD.org The reason I set sc->pending_txs to 0 in DFly after the reinit is because when a watchdog timeout occurs and you reset the device, *ALL* mbufs still sitting in the transmit ring are lost. They will never be acknowledged, ever. So pending_txs will never drop back to 0 on its own. This is what led to continuous watchdog timeout reports when, in fact, only one timeout actually occured. The FreeBSD code does set pending_txs to 0 in nve_stop(). I'm not sure this is correct, however, unless the pfnStop() ABI call cleans out pending mbufs in the transmit ring (which seems unlikely). The count would wind up going negative. Another problem that neither of us has dealt with yet is recovery of dead transmit mbufs. Right now that only occurs in nve_ospackettx(), but nve_ospackettx() is only called by the Nvidia code during normal operation. ABI calls to e.g. reset the Nvidia device will *NOT* clean out the transmit ring and call nve_ospackettx(), so we lose track of all the mbufs that were sitting in there at the time of a reinit. But, of course, the biggest problem is simply the fact that the NVidia ABI library seems to be rather broken. On my nForce4-based boxes the DFly driver can recover from numerous watchdog timeouts (and they occur quite often, even when the network load is virtually nil), but after an hour or two of testing at GiGE speeds the hardware itself stops working entirely, to the point where I have to physically unplug and replug the power cord for the machine for the hardware to start working again. -Matt Matthew Dillon <dillon_at_backplane.com>Received on Thu Nov 24 2005 - 22:29:41 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:48 UTC