Re: Deterministic panic due to non-sleepable lock with if_alc when reconfiguring interfaces

From: YongHyeon PYUN <pyunyh_at_gmail.com> Date: Mon, 22 Aug 2011 13:40:54 -0700 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:17 UTC

On Sun, Aug 21, 2011 at 04:48:56PM -0700, YongHyeon PYUN wrote:
> On Fri, Aug 19, 2011 at 12:17:12AM -0700, Garrett Cooper wrote:
> > On Thu, Aug 18, 2011 at 9:31 PM,  <mdf_at_freebsd.org> wrote:
> > > On Thu, Aug 18, 2011 at 5:50 PM, Garrett Cooper <yanegomi_at_gmail.com> wrote:
> > >> ? ?When loading if_alc as a module on my netbook and running
> > >> /etc/rc.d/netif restart, I can deterministically panic my netbook with
> > >> the following message:
> > 
> >     These repro steps were overly simplified. The complete steps are:
> > 
> > 1. Attach ethernet cable to alc(4) enabled NIC.
> > 2. Boot up machine.
> > 3. Login.
> > 4. Physically remove ethernet cable from alc(4) enabled NIC.
> > 5. Run `/etc/rc.d/netif restart' as root.
> > 
> 
> I can't reproduce this with AR8151 sample board. Could you give me
> dmesg output to know exact controller revision?
> One issue I'm aware of is lack of re-establishing link when
> controller firmware put its PHY to deep sleep mode.  The deep sleep
> mode seems to be automatically activated by firmware when it
> detects no energy signal(i.e. cable unplugged) so I had to down and
> up the interface again to take the PHY out of the sleep mode.
> 

Not re-establishing link issue was fixed in r225088.  I'm not sure
whether this also fixes kern/148772 though. Because you also seem
to have the same issue of the PR, it would be good to know whether
it makes any difference or not.

> > >> ) at _bus_dmamap_sync+0x51
> > >> alc_stop(c3dbb000,0,c0c51844,93a,80206910,...) at alc_stop+0x24e
> > >> alc_ioctl(c3d07400,80206910,c40423c0,c06a7935,c0914e3c,...) at alc_ioctl+0x22e
> > >> ifioctl(c45029c0,80206910,c40423c0,c40505c0,c4528c00,...) at ifioctl+0xc98
> > >> soo_ioctl(c4574e00,80206910,c40423c0,c413e680,c40505c0,...) at soo_ioctl+0x401
> > >> kern_ioctl(c40505c0,3,80206910,c40423c0,c40423c0,...) at kern_ioctl+0x1d7
> > >> ioctl(c40505c0,e6ca3cec,e6ca3d28,c08e929d,0,...) at ioctl+0x118
> > >> syscallenter(c40505c0,e6ca3ce4,e6ca3ce4,0,0,...) at syscallenter+0x23f
> > >> syscall(e6ca3d28) at syscall+0x2e
> > >> Xint0x80_syscall() at Xint0x80_syscall+0x21
> > >> --- syscall (54kernel trap 12 with interrupts disabled
> > >> Kernel page fault with the following non-sleepable locks held:
> > >> exclusive sleep mutex alc0 (network driver) r = 0 (0xc3dbc608) locked
> > >> _at_ /usr/src/sys/modules/alc/../../dev/alc/if_alc.c:2362
> > >> KDB: stack backtrace:
> > >> db_trace_self_wrapper(c08e727a,80,6e726500,74206c65,20706172,...) at
> > >> db_trace_self_wrapper+0x26
> > >> kdb_backtrace(93a,0,ffffffff,c0ad6114,e6ca323c,...) at kdb_backtrace+0x2a
> > >> _witness_debugger(c08e9f67,e6ca3250,4,1,0,...) at _witness_debugger+0x1e
> > >> witness_warn(5,0,c0924fe1,c097df50,c3e42b00,...) at witness_warn+0x1f1
> > >> trap(e6ca32dc) at trap+0x15a
> > >> calltrap() at calltrap+0x6
> > >>
> > >> ? ?I tried to track down what the exact issue was, but I got lost
> > >> (the locking sort of looks ok to me, but I'm still not an expert with
> > >> mutex(9)).
> > >> ? ?I still have the vmcore and can provide more helpful details when requested.
> > >
> > > The locking itself is almost certainly fine. ?The error message is not
> > > very helpful, but what went wrong was the page fault. ?You just happen
> > > to panic on a witness warning before vm_fault can panic due to a bad
> > > address.
> > >
> > > The alc(4) maintainer would probably like info on the trap (line of
> > > code and where the bad pointer came from).
> > 
> >     I talked to Xin a bit and as he noted the panic was just a symptom
> > of the actual issue at hand. I think the problem is that the rx ring's
> > rx_m value isn't set to NULL when an error occurred, but getting to
> > the exact problem at hand, the following call is failing:
> > 
> >         if (bus_dmamap_load_mbuf_sg(sc->alc_cdata.alc_rx_tag, // <-- HERE
> >             sc->alc_cdata.alc_rx_sparemap, m, segs, &nsegs, 0) != 0) {
> >                 m_freem(m);
> >                 return (ENOBUFS);
> >         }
> > 
> >     It's failing with ENOMEM. Still trying to determine what the exact
> 
> Even if bus_dmamap_load_mbuf_sg(9) fails driver should not panic.
> Could you show me full back-trace?
> 
> > reason for ENOMEM is from the x86 busdma code though..
> > Thanks,
> > -Garrett
> >