I pretty reproducible get the following (handtranscribed) panic when sending an zfs snapshot to geli provider based on an USB stick that disappears (due to a bug, or because it's unplugged): Fatal trap 12: page fault while in kernel mode cpuid = 0: apic id = 00 fault virtual address = 0x288 fault code = supervisor write data, page not present instruction pointer = 0x20:0xffffffff808e2984 stack pointer = 0x28:0xffffff800023fba0 frame pointer = 0x28:0xffffff800023fbb0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 13 (g_up) [ thread pid 13 tid 100010 ] Stopped at atomic_subtract_int+0x4: lock subl %esi,(%rdi) db> where Tracing pid 13 tid 100010 td 0xfffffe00027998c0 atomic_subtract_int() at atomic_subtract_int+0x4 g_io_schdule_up() at g_io_schedule_up+0xa6 g_up_procbody() at g_up_procbody+0x5c fork_exit() at fork_exit+0x11f fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xffffff800023fd00, rbp 0 --- It seems to be important that ZFS is actually writing to the stick. If the stick is unplugged while the operation is stalled for other reasons, this particular panic doesn't seem to occur. While I end up in the debugger, dumping core doesn't work and produces a double fault and a bunch of duplicated messages (again handtranscribed): db> dump Dumping 443 out of 1974 MB: Dumping 443 out of 1974 MB Fatal double fault Fatal double fault rip = 0xffffffff8066a9e0 rip = 0xffffffff8066a9e0 rsp = 0xffffff800023c000 rsp = 0xffffff800023c000 rbp = 0xffffff800023c040 rbp = 0xffffff800023c040 cpuid = 0; cpuid = 0; apic id = 00 apic id = 00 panic: double fault panic: double fault cpuid = 0 cpuid = 0 KDB: stack backtrace: KDB: stack backtrace: db_trac_self_wrapper() at db_trace_self_wrapper+0x2a kdb_backtrace() at kdb_backtrace+0x37 panic() at panic+0x187 dblfault_handler() at dblfault_handler+0xa4 Xdblfault() at Xdblfault+0xa8 --- trap 0x17, rip = 0xffffffff8066a9e8, rsp = 0xffffffff80e56158, rbp = 0xffffff800023c040 --- mi_switch() at mi_switch+0x270 critical_exit() at critical_exit+0x9b spinlock_exit() at spinlock_exit+0x17 mi_switch() at mi_switch+0x275 critical_exit() at critical_exit+0x9b spinlock_exit() at spinlock_exit+0x17 [several pages of the previous three lines skipped] mi_switch() at mi_switch+0x275 critical_exit() at critical_exit+0x9b spinlock_exit() at spinlock_exit+0x17 intr_even_schedule_thread() at intr_event_schedule_thread+0xbb ahci_end_transaction() at ahci_end_transaction+0x398 ahci_ch_intr() at ahci_ch_intr+0x2b5 ahcipoll() at ahcipoll+0x15 xpt_polled_action() at xpt_polled_action+0xf7 I first noticed the problem with CURRENT from a week ago, but given that USB sticks don't usually disappear for me while sending snapshots to them, the problem might not be new. I'm using amd64, the panic above is from a custom kernel without WITNESS and INVARIANTS, but enabling them doesn't seem to affect the trace before the double fault. I wasn't able to reproduce the panic by unplugging the stick while writing to the pool using dd (but only tried once). Fabian
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:18 UTC