On Sun, Aug 21, 2011 at 6:55 PM, YongHyeon PYUN <pyunyh_at_gmail.com> wrote: > On Sun, Aug 21, 2011 at 06:26:45PM -0700, Garrett Cooper wrote: >> On Sun, Aug 21, 2011 at 4:48 PM, YongHyeon PYUN <pyunyh_at_gmail.com> wrote: >> > On Fri, Aug 19, 2011 at 12:17:12AM -0700, Garrett Cooper wrote: >> >> On Thu, Aug 18, 2011 at 9:31 PM, ?<mdf_at_freebsd.org> wrote: >> >> > On Thu, Aug 18, 2011 at 5:50 PM, Garrett Cooper <yanegomi_at_gmail.com> wrote: >> >> >> ? ?When loading if_alc as a module on my netbook and running >> >> >> /etc/rc.d/netif restart, I can deterministically panic my netbook with >> >> >> the following message: >> >> >> >> ? ? These repro steps were overly simplified. The complete steps are: >> >> >> >> 1. Attach ethernet cable to alc(4) enabled NIC. >> >> 2. Boot up machine. >> >> 3. Login. >> >> 4. Physically remove ethernet cable from alc(4) enabled NIC. >> >> 5. Run `/etc/rc.d/netif restart' as root. >> >> >> > >> > I can't reproduce this with AR8151 sample board. Could you give me >> > dmesg output to know exact controller revision? >> > One issue I'm aware of is lack of re-establishing link when >> > controller firmware put its PHY to deep sleep mode. ?The deep sleep >> > mode seems to be automatically activated by firmware when it >> > detects no energy signal(i.e. cable unplugged) so I had to down and >> > up the interface again to take the PHY out of the sleep mode. >> > >> >> >> ) at _bus_dmamap_sync+0x51 >> >> >> alc_stop(c3dbb000,0,c0c51844,93a,80206910,...) at alc_stop+0x24e >> >> >> alc_ioctl(c3d07400,80206910,c40423c0,c06a7935,c0914e3c,...) at alc_ioctl+0x22e >> >> >> ifioctl(c45029c0,80206910,c40423c0,c40505c0,c4528c00,...) at ifioctl+0xc98 >> >> >> soo_ioctl(c4574e00,80206910,c40423c0,c413e680,c40505c0,...) at soo_ioctl+0x401 >> >> >> kern_ioctl(c40505c0,3,80206910,c40423c0,c40423c0,...) at kern_ioctl+0x1d7 >> >> >> ioctl(c40505c0,e6ca3cec,e6ca3d28,c08e929d,0,...) at ioctl+0x118 >> >> >> syscallenter(c40505c0,e6ca3ce4,e6ca3ce4,0,0,...) at syscallenter+0x23f >> >> >> syscall(e6ca3d28) at syscall+0x2e >> >> >> Xint0x80_syscall() at Xint0x80_syscall+0x21 >> >> >> --- syscall (54kernel trap 12 with interrupts disabled >> >> >> Kernel page fault with the following non-sleepable locks held: >> >> >> exclusive sleep mutex alc0 (network driver) r = 0 (0xc3dbc608) locked >> >> >> _at_ /usr/src/sys/modules/alc/../../dev/alc/if_alc.c:2362 >> >> >> KDB: stack backtrace: >> >> >> db_trace_self_wrapper(c08e727a,80,6e726500,74206c65,20706172,...) at >> >> >> db_trace_self_wrapper+0x26 >> >> >> kdb_backtrace(93a,0,ffffffff,c0ad6114,e6ca323c,...) at kdb_backtrace+0x2a >> >> >> _witness_debugger(c08e9f67,e6ca3250,4,1,0,...) at _witness_debugger+0x1e >> >> >> witness_warn(5,0,c0924fe1,c097df50,c3e42b00,...) at witness_warn+0x1f1 >> >> >> trap(e6ca32dc) at trap+0x15a >> >> >> calltrap() at calltrap+0x6 >> >> >> >> >> >> ? ?I tried to track down what the exact issue was, but I got lost >> >> >> (the locking sort of looks ok to me, but I'm still not an expert with >> >> >> mutex(9)). >> >> >> ? ?I still have the vmcore and can provide more helpful details when requested. >> >> > >> >> > The locking itself is almost certainly fine. ?The error message is not >> >> > very helpful, but what went wrong was the page fault. ?You just happen >> >> > to panic on a witness warning before vm_fault can panic due to a bad >> >> > address. >> >> > >> >> > The alc(4) maintainer would probably like info on the trap (line of >> >> > code and where the bad pointer came from). >> >> >> >> ? ? I talked to Xin a bit and as he noted the panic was just a symptom >> >> of the actual issue at hand. I think the problem is that the rx ring's >> >> rx_m value isn't set to NULL when an error occurred, but getting to >> >> the exact problem at hand, the following call is failing: >> >> >> >> ? ? ? ? if (bus_dmamap_load_mbuf_sg(sc->alc_cdata.alc_rx_tag, // <-- HERE >> >> ? ? ? ? ? ? sc->alc_cdata.alc_rx_sparemap, m, segs, &nsegs, 0) != 0) { >> >> ? ? ? ? ? ? ? ? m_freem(m); >> >> ? ? ? ? ? ? ? ? return (ENOBUFS); >> >> ? ? ? ? } >> >> >> >> ? ? It's failing with ENOMEM. Still trying to determine what the exact >> > >> > Even if bus_dmamap_load_mbuf_sg(9) fails driver should not panic. >> > Could you show me full back-trace? >> >> I tried to hack the kernel to get it to dump properly, but that >> inevitably failed (one of the buffers or the stack data associated >> probably got stomped on when the system panicked). >> Here are some pics. > > Thanks a lot. I see that alc(4) failed to allocate RX buffers and > it seems the panic happened in alc_stop(). But I can't understand > how it could be triggered. When RX buffer allocation failed, the > mbuf pointer would have been NULL such that bus_dmamap_sync(9) > wouldn't be invoked in alc_stop(). > I also see you have wireless network setup in the back trace. Could > you also reproduce alc(4) panic without wireless network > configuration? Unfortunately disabling wireless and if_ath still yields the panic. -GarrettReceived on Mon Aug 22 2011 - 00:39:27 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:17 UTC