Fatal trap 12: page fault while in kernel mode -- Stopped at atomic_subtract_int+0x4

From: Fabian Keil <freebsd-listen_at_fabiankeil.de> Date: Tue, 27 Sep 2011 22:00:15 +0200 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:18 UTC

I pretty reproducible get the following (handtranscribed) panic
when sending an zfs snapshot to geli provider based on an USB
stick that disappears (due to a bug, or because it's unplugged): 

Fatal trap 12: page fault while in kernel mode
cpuid = 0: apic id = 00
fault virtual address = 0x288
fault code	      = supervisor write data, page not present
instruction pointer   = 0x20:0xffffffff808e2984
stack pointer         = 0x28:0xffffff800023fba0
frame pointer         = 0x28:0xffffff800023fbb0
code segment          = base 0x0, limit 0xfffff, type 0x1b
                      = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags      = interrupt enabled, resume, IOPL = 0
current process       = 13 (g_up)
[ thread pid 13 tid 100010 ]
Stopped at    atomic_subtract_int+0x4: lock subl %esi,(%rdi)
db> where
Tracing pid 13 tid 100010 td 0xfffffe00027998c0
atomic_subtract_int() at atomic_subtract_int+0x4
g_io_schdule_up() at g_io_schedule_up+0xa6
g_up_procbody() at g_up_procbody+0x5c
fork_exit() at fork_exit+0x11f
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xffffff800023fd00, rbp 0 ---

It seems to be important that ZFS is actually writing to the stick.
If the stick is unplugged while the operation is stalled for other
reasons, this particular panic doesn't seem to occur.

While I end up in the debugger, dumping core doesn't work
and produces a double fault and a bunch of duplicated
messages (again handtranscribed):

db> dump
Dumping 443 out of 1974 MB: Dumping 443 out of 1974 MB

Fatal double fault
Fatal double fault
rip = 0xffffffff8066a9e0
rip = 0xffffffff8066a9e0
rsp = 0xffffff800023c000
rsp = 0xffffff800023c000
rbp = 0xffffff800023c040
rbp = 0xffffff800023c040
cpuid = 0; cpuid = 0; apic id = 00
apic id = 00
panic: double fault
panic: double fault
cpuid = 0
cpuid = 0
KDB: stack backtrace:
KDB: stack backtrace:
db_trac_self_wrapper() at db_trace_self_wrapper+0x2a
kdb_backtrace() at kdb_backtrace+0x37
panic() at panic+0x187
dblfault_handler() at dblfault_handler+0xa4
Xdblfault() at Xdblfault+0xa8
--- trap 0x17, rip = 0xffffffff8066a9e8, rsp = 0xffffffff80e56158, rbp = 0xffffff800023c040 ---
mi_switch() at mi_switch+0x270
critical_exit() at critical_exit+0x9b
spinlock_exit() at spinlock_exit+0x17
mi_switch() at mi_switch+0x275
critical_exit() at critical_exit+0x9b
spinlock_exit() at spinlock_exit+0x17
[several pages of the previous three lines skipped]
mi_switch() at mi_switch+0x275
critical_exit() at critical_exit+0x9b
spinlock_exit() at spinlock_exit+0x17
intr_even_schedule_thread() at intr_event_schedule_thread+0xbb
ahci_end_transaction() at ahci_end_transaction+0x398
ahci_ch_intr() at ahci_ch_intr+0x2b5
ahcipoll() at ahcipoll+0x15
xpt_polled_action() at xpt_polled_action+0xf7

I first noticed the problem with CURRENT from a week ago,
but given that USB sticks don't usually disappear for me
while sending snapshots to them, the problem might not
be new.

I'm using amd64, the panic above is from a custom kernel
without WITNESS and INVARIANTS, but enabling them doesn't
seem to affect the trace before the double fault.

I wasn't able to reproduce the panic by unplugging the stick
while writing to the pool using dd (but only tried once).

Fabian