On Wednesday 10 January 2007 07:07, Sergey Zaharchenko wrote: > Hello -current, > > While chasing that smbfs recursive locking thing, I decided to try > copying a large amount of small files (/usr/src actually) to an SMB > share to which I am connected by an NVIDIA nForce MCP2 card. I have come > across a lock order reversal which seems related to the card. First, > some files are copied, then I see the following kernel messages, some > more files are copied, and then the system hangs without responding to > the keyboard or anything. > > : lock order reversal: > : 1st 0xc3629f00 inp (tcpinp) _at_ /src/usr.src/sys/netinet/tcp_usrreq.c:801 > : 2nd 0xc0a9feec tcp (tcp) _at_ /src/usr.src/sys/netinet/tcp_input.c:626 > : KDB: stack backtrace: > : db_trace_self_wrapper(c0950c60) at db_trace_self_wrapper+0x25 > : kdb_backtrace(0,ffffffff,c0a612a8,c0a612d0,c09f8e84,...) at kdb_backtrace+0x29 > : witness_checkorder(c0a9feec,9,c095ec63,272) at witness_checkorder+0x586 > : _mtx_lock_flags(c0a9feec,0,c095ec63,272,0,...) at _mtx_lock_flags+0x84 > : tcp_input(c32df800,14,c3300800,100a8c0,0,...) at tcp_input+0x432 > : ip_input(c32df800) at ip_input+0x5a6 > : netisr_dispatch(2,c32df800,0,c32c5000,c3300800,...) at netisr_dispatch+0x58 > : ether_demux(c32c5000,c32df800,c32caed8,c32df800,dd1757d4,...) at ether_demux+0x28a > : ether_input(c32c5000,c32df800,c32caed8,0,c0970133,...) at ether_input+0x202 > : nve_ospacketrx(c32cae00,dd175810,1,0,0,...) at nve_ospacketrx+0xd9 > : UpdateReceiveDescRingData(c08981a4,c08981c4,c0898260,c089828c,c08982a4,...) at UpdateReceiveDescRingData+0x2f8 > : nve_osalloc(c32cb200,dd391010,c32cae00,c0898108,c08981a4,...) at nve_osalloc > : _end(c33a5c00,c0a9e784,3065766e,0,0,...) at 0xc32aa600 > : _end(c32cb200,dd391010,c32cae00,c0898108,c08981a4,...) at 0xc3327680 > : _end(c33a5c00,c0a9e784,3065766e,0,0,...) at 0xc32aa600 > : _end(c32cb200,dd391010,c32cae00,c0898108,c08981a4,...) at 0xc3327680 > > The last 2 strings repeat themselves a lot of times (kdb seems to have a > limit of 1024 stack trace strings, which came in very helpful). No info > about the actual hang... The LOR looks like #009 > (http://sources.zabbadoz.net/freebsd/lor/009.html), but is different > actually. Any ideas? BTW, what is _end? _end may hint to being out in a kernel module, though ddb usually can handle those fine. I think your stack is busted somehow though as nve_osalloc() doesn't call UpdateReceiveDescRingData(), and the first lock is acquired in tcp_usr_send() (userland is sending data on a tcp socket). Somehow the nve driver has decided to handle receiving a packet and re-entering the stack leading to the LOR. Have you tried using nfe(4)? :) -- John BaldwinReceived on Wed Jan 10 2007 - 13:39:26 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:04 UTC