Re: Deterministic panic due to non-sleepable lock with if_alc when reconfiguring interfaces

From: <mdf_at_FreeBSD.org> Date: Thu, 18 Aug 2011 21:31:28 -0700 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:16 UTC

On Thu, Aug 18, 2011 at 5:50 PM, Garrett Cooper <yanegomi_at_gmail.com> wrote:
>    When loading if_alc as a module on my netbook and running
> /etc/rc.d/netif restart, I can deterministically panic my netbook with
> the following message:
>
> ) at _bus_dmamap_sync+0x51
> alc_stop(c3dbb000,0,c0c51844,93a,80206910,...) at alc_stop+0x24e
> alc_ioctl(c3d07400,80206910,c40423c0,c06a7935,c0914e3c,...) at alc_ioctl+0x22e
> ifioctl(c45029c0,80206910,c40423c0,c40505c0,c4528c00,...) at ifioctl+0xc98
> soo_ioctl(c4574e00,80206910,c40423c0,c413e680,c40505c0,...) at soo_ioctl+0x401
> kern_ioctl(c40505c0,3,80206910,c40423c0,c40423c0,...) at kern_ioctl+0x1d7
> ioctl(c40505c0,e6ca3cec,e6ca3d28,c08e929d,0,...) at ioctl+0x118
> syscallenter(c40505c0,e6ca3ce4,e6ca3ce4,0,0,...) at syscallenter+0x23f
> syscall(e6ca3d28) at syscall+0x2e
> Xint0x80_syscall() at Xint0x80_syscall+0x21
> --- syscall (54kernel trap 12 with interrupts disabled
> Kernel page fault with the following non-sleepable locks held:
> exclusive sleep mutex alc0 (network driver) r = 0 (0xc3dbc608) locked
> _at_ /usr/src/sys/modules/alc/../../dev/alc/if_alc.c:2362
> KDB: stack backtrace:
> db_trace_self_wrapper(c08e727a,80,6e726500,74206c65,20706172,...) at
> db_trace_self_wrapper+0x26
> kdb_backtrace(93a,0,ffffffff,c0ad6114,e6ca323c,...) at kdb_backtrace+0x2a
> _witness_debugger(c08e9f67,e6ca3250,4,1,0,...) at _witness_debugger+0x1e
> witness_warn(5,0,c0924fe1,c097df50,c3e42b00,...) at witness_warn+0x1f1
> trap(e6ca32dc) at trap+0x15a
> calltrap() at calltrap+0x6
>
>    I tried to track down what the exact issue was, but I got lost
> (the locking sort of looks ok to me, but I'm still not an expert with
> mutex(9)).
>    I still have the vmcore and can provide more helpful details when requested.

The locking itself is almost certainly fine.  The error message is not
very helpful, but what went wrong was the page fault.  You just happen
to panic on a witness warning before vm_fault can panic due to a bad
address.

The alc(4) maintainer would probably like info on the trap (line of
code and where the bad pointer came from).

Cheers,
matthew