Re: CURRENT freezes on Laitude D520

From: Robert Watson <rwatson_at_FreeBSD.org>
Date: Sun, 10 Dec 2006 12:57:14 +0000 (GMT)
On Sun, 10 Dec 2006, Maxim Konovalov wrote:

>>> I didn't suggest to turn off mpsafenet forever and forget, I just wanted 
>>> to check my guess.  I would like to help to debug the problem but I need 
>>> some initial instructions to start.  There is a firewire console.  What do 
>>> I need to check?
>>
>> Start with the information in my followup e-mail to Andrew:
>>
>> - Configure WITNESS and see if you get any console output regarding
>>   lock order problems.
>
> Yes, there is one:
>
> lock order reversal
> 1st 0xd0f277c8 inp (rawinp) _at_ /usr/src/sys/netinet/raw_ip.c
> 2nd 0xd0ecbb54 wi0 (network driver) _at_ /usr/src/sys/modules/wi/../../dev/wi/if_wi.c
> KDB
> db_trace_self_wrapper(ce626f9d) at db_trace_self_wrapper+0x25
> kdb_backtrace(ffffffff,ce6a6378,ce6a6b20,ce65bd24,ce6e4ed0,...) at kdb_backtrace+0x29
> witness_checkorder(d0ecbb54,9,d0e73d13,388) at witness_checkorder+0x4db
> _mtx_lock_flags(d0ecbb54,0,d0e73d13,388,ce4d8cdd,...) at _mtx_lock_flags+0x1e
> wi_start(d0e05800) at wi_start+0x32
> if_start(d0e05800) at if_start+0x53
> ether_output_frame(d0e05800,d0d18100,0,1,0,...) at ether_output_frame+0x180
> ether_output(d0e05800,d0d18100,d0e652b0,d0e61bb8,ce6e6b18,...) at ether_output+0x3c0
> ieee80211_output(d0e05800,d0d18100,d0e652b0,d0e61bb8,0,...) at ieee80211_output+0x33
> ip_output(d0d18100,0,e1afbb38,20,0,...) at ip_output+0x7f0
> rip_output(d0d18100,d102ee44,1d2722c3,2000,e1afbbf0,...) at rip_output+0x29b
> rip_send(d102ee44,0,d0d18100,0,0,...) at rip_send+0x4f
> sosend_generic(d102ee44,0,0,d0d18100,0,...) at sosend_generic+0x3e1
> sosend(d102ee44,0,0,d0d18100,0,...) at sosend+0x22
> ng_ksocket_rcvdata(d10ab280,d104f750,1,e1afbc78,0,...) at ng_ksocket_rcvdata+0xa3
> ng_apply_item(d10ab200,d104f750,0,0,d10ab200,...) at ng_apply_item+0xf8
> ngintr(0) at ngintr+0x13d
> swi_net(0) at swi_net+0xba
> ithread_execute_handlers(d09acb40,d09dba00) at ithread_execute_handlers+0xce
> ithread_loop(d09dc180,e1afbd38,ce697af0,0,ce622832,328) at ithread_loop+0x4f
> fork_exit(ce4cdf0c,d09dc180,e1afbd38) at fork_exit+0x68
> fork_trampoline() at fork_trampoline+0x8
> --- trap 0x1, eip = 0, esp = 0xe1afbd6c, ebp = 0 ---
>
> At this point ifconfig wlan0 hangs, reboot hangs.
>
>> - Try setting net.isr.direct=0 and see if the problem goes away.
>
> This indeed help.  LOR has gone and wireless works.
>
>> - Try removing options PREEMPTION and see if the problem goes away.
>
> Haven't try.

As speculated by others, this is a bug in the if_wi driver, which improperly 
holds a device driver lock over a call into the network stack.  While this can 
result in a deadlock under other circumstances, net.isr.direct makes the 
chances of that deadlock much greater.  It appears also that you have netgraph 
in the mix somehow, which might well also increase the chances of the deadlock 
triggering.  Someone(tm) needs to fix if_wi to operate properly with respect 
to the network stack lock order; another feature likely to trigger the same 
device driver bug is IP fast forwarding from a wireless interface.  Sam has 
mentioned to me that this same bug exists in several wireless drivers.

Robert N M Watson
Computer Laboratory
University of Cambridge
Received on Sun Dec 10 2006 - 11:57:15 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:03 UTC