Re: em stability issues + panic

From: Pyun YongHyeon <pyunyh_at_gmail.com> Date: Mon, 2 Oct 2006 16:55:05 +0900 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:01 UTC

On Mon, Oct 02, 2006 at 12:26:34AM -0700, John-Mark Gurney wrote:
 > Well, I will admit I have a bit older if_em.c, v1.147, but I haven't
 > been doing much w/ my em, probably not even passing close to 100mbit
 > of traffic (in gige mode)...  I recently obtained a crash dump from
 > em_txeof where the tx_buffer is NULL at line 2958:
 > 2958                    if (tx_buffer->m_head) {
 > 
 > If any one want some additional data, I can provide info from the
 > crash dump...   Just as a bit of trivia, I did load a few kld's..
 > bktr.ko, bktrau.ko and iic.ko (plus respective other kld's that got
 > auto loaded)...   It also seems that interactiveness is more likely
 > to hang em than other traffic...  I've been running the box as a nfs
 > server for a while w/o issues, but I log in and run ffmpeg, and it
 > almost immediately hangs requiring an down/up to bring back the
 > interface...
 > 
 > The panic was when I was bringing the interface back up... Though when
 > I it paniced, I had down/up'd the interface a few times w/o success in
 > bringing it back...
 > 
 > Fatal trap 12: page fault while in kernel mode
 > cpuid = 0; apic id = 00
 > fault virtual address   = 0x0
 > fault code              = supervisor read, page not present
 > instruction pointer     = 0x20:0xc047155e
 > stack pointer           = 0x28:0xe1d1cc50
 > frame pointer           = 0x28:0xe1d1cc64
 > code segment            = base 0x0, limit 0xfffff, type 0x1b
 >                         = DPL 0, pres 1, def32 1, gran 1
 > processor eflags        = interrupt enabled, resume, IOPL = 0
 > current process         = 12 (swi4: clock sio)
 > Physical memory: 999 MB
 > Dumping 225 MB: 210 194 178 162 146 130 114 98 82 66 50 34 18 2
 > 
 > #9  0xc066f56a in calltrap () at ../../../i386/i386/exception.s:138
 > #10 0xc047155e in em_txeof (adapter=0xc34df800) at ../../../dev/em/if_em.c:2956
 > #11 0xc046e502 in em_watchdog (ifp=0xc3502400) at ../../../dev/em/if_em.c:963
 > #12 0xc0585b22 in if_slowtimo (arg=0x0) at ../../../net/if.c:1415
 > #13 0xc0529fa9 in softclock (dummy=0x0) at ../../../kern/kern_timeout.c:271
 > #14 0xc050a57a in ithread_execute_handlers (p=0xc33c38d0, ie=0xc341c580)
 >     at ../../../kern/kern_intr.c:662
 > #15 0xc050a673 in ithread_loop (arg=0xc33a2940)
 >     at ../../../kern/kern_intr.c:745
 > #16 0xc050981b in fork_exit (callout=0xc050a624 <ithread_loop>, 
 >     arg=0xc33a2940, frame=0xe1d1cd38) at ../../../kern/kern_fork.c:818
 > #17 0xc066f5cc in fork_trampoline () at ../../../i386/i386/exception.s:199
 > 

I think bringing the interface down while Rx is active may corrupt
internal hardware state because em_rxeof() runs without driver lock.
See http://lists.freebsd.org/pipermail/freebsd-current/2006-September/066203.html
You may need to protect em_rxeof with dirver lock in em_handle_rxtx().
(Remember dropping driver lock before invoking if_input in em_rxeof.)
-- 
Regards,
Pyun YongHyeon