Re: nve related LOR triggered by lots of small packets, and a hard hang

From: John Baldwin <jhb_at_freebsd.org> Date: Wed, 10 Jan 2007 09:10:12 -0500 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:04 UTC

On Wednesday 10 January 2007 07:07, Sergey Zaharchenko wrote:
> Hello -current,
> 
> While chasing that smbfs recursive locking thing, I decided to try
> copying a large amount of small files (/usr/src actually) to an SMB
> share to which I am connected by an NVIDIA nForce MCP2 card. I have come
> across a lock order reversal which seems related to the card. First,
> some files are copied, then I see the following kernel messages, some
> more files are copied, and then the system hangs without responding to
> the keyboard or anything.
> 
> : lock order reversal:
> :  1st 0xc3629f00 inp (tcpinp) _at_ /src/usr.src/sys/netinet/tcp_usrreq.c:801
> :  2nd 0xc0a9feec tcp (tcp) _at_ /src/usr.src/sys/netinet/tcp_input.c:626
> : KDB: stack backtrace:
> : db_trace_self_wrapper(c0950c60) at db_trace_self_wrapper+0x25
> : kdb_backtrace(0,ffffffff,c0a612a8,c0a612d0,c09f8e84,...) at kdb_backtrace+0x29
> : witness_checkorder(c0a9feec,9,c095ec63,272) at witness_checkorder+0x586
> : _mtx_lock_flags(c0a9feec,0,c095ec63,272,0,...) at _mtx_lock_flags+0x84
> : tcp_input(c32df800,14,c3300800,100a8c0,0,...) at tcp_input+0x432
> : ip_input(c32df800) at ip_input+0x5a6
> : netisr_dispatch(2,c32df800,0,c32c5000,c3300800,...) at netisr_dispatch+0x58
> : ether_demux(c32c5000,c32df800,c32caed8,c32df800,dd1757d4,...) at ether_demux+0x28a
> : ether_input(c32c5000,c32df800,c32caed8,0,c0970133,...) at ether_input+0x202
> : nve_ospacketrx(c32cae00,dd175810,1,0,0,...) at nve_ospacketrx+0xd9
> : UpdateReceiveDescRingData(c08981a4,c08981c4,c0898260,c089828c,c08982a4,...) at UpdateReceiveDescRingData+0x2f8
> : nve_osalloc(c32cb200,dd391010,c32cae00,c0898108,c08981a4,...) at nve_osalloc
> : _end(c33a5c00,c0a9e784,3065766e,0,0,...) at 0xc32aa600
> : _end(c32cb200,dd391010,c32cae00,c0898108,c08981a4,...) at 0xc3327680
> : _end(c33a5c00,c0a9e784,3065766e,0,0,...) at 0xc32aa600
> : _end(c32cb200,dd391010,c32cae00,c0898108,c08981a4,...) at 0xc3327680
> 
> The last 2 strings repeat themselves a lot of times (kdb seems to have a
> limit of 1024 stack trace strings, which came in very helpful). No info
> about the actual hang... The LOR looks like #009
> (http://sources.zabbadoz.net/freebsd/lor/009.html), but is different
> actually. Any ideas? BTW, what is _end?

_end may hint to being out in a kernel module, though ddb usually can handle
those fine.  I think your stack is busted somehow though as nve_osalloc()
doesn't call UpdateReceiveDescRingData(), and the first lock is acquired
in tcp_usr_send() (userland is sending data on a tcp socket).  Somehow
the nve driver has decided to handle receiving a packet and re-entering
the stack leading to the LOR.  Have you tried using nfe(4)? :)

-- 
John Baldwin