r187880 causes fatal trap 30 when unloading drivers

From: Andrew Gallatin <gallatin_at_cs.duke.edu>
Date: Tue, 17 Feb 2009 14:26:15 -0500
I'm seeing a panic when I unload if_mxge.  I suspect it was caused by
the recent change to allocate apic vectors on a per-CPU basis.

I see the panic only when running an SMP kernel, and only on our 8-way
opterons (a dual-core athlon64 is fine).  This is on a box with 2
NICs.  Every time I unload my driver, 2 CPUs panic at the same time
slightly after unloading the driver.  It occurs both when I use a
single MSI, or legacy interrupts.  Untangling the garbled jibberish, I
see this on console:

Fatal trap 30: reserved (unknown) fault while in kernel mode
cpuid = 2; apic id = 02
instruction pointer     = 0x8:0xffffffff807ded46
stack pointer               = 0x10:0xfffffffe40063b70
frame pointer               = 0x10:0xfffffffe40063b80
code segment                = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, IOPL = 0
current process             = 11 (idle: cpu2)
trap number             = 30

Fatal trap 30: reserved (unknown) fault while in kernel mode
cpuid = 1; apic id = 01
instruction pointer     = 0x8:0xffffffff807ded46
stack pointer               = 0x10:0xfffffffe40068b70
frame pointer               = 0x10:0xfffffffe40068b80
code segment                = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, IOPL = 0
current process             = 11 (idle: cpu1)
trap number             = 30

I cannot get a dump, and ddb shows that it is sitting in the
acpi acpi_cpu_c1() function.  I saw a similar report a
little while back (http://lists.freebsd.org/pipermail/freebsd-current/2009-February/003141.html).
Following John's suggestion later in the thread, I tried backing
out r187880, and I can again unload drivers.

FWIW, I'm fairly certain the unhandled IRQ is not coming from the NIC.
The NIC will not generate interrupts when it is not ifconfig'ed up.
Given that, and how I usually see kldunload finish before the panic
happens, I wonder if it might be a clock interrupt that is triggering
the trap.


Drew
Received on Tue Feb 17 2009 - 21:50:23 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:42 UTC