panic in vdev_geom_io_intr

From: Ulrich Spoerlein <uspoerlein_at_gmail.com> Date: Fri, 28 Mar 2008 18:43:08 +0100 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:29 UTC

(Sorry Pawel, to CC you directly, but you should be the guy most
familiar with the code in question)

Hi folks,

I'm testing a DRBD-lookalike solution for FreeBSD involving ZFS and
GGATE. It is basically working, but quickly hangs or panics.

This is what I did (md1 is local, ggate1 on another machine)

igor# zpool create tank mirror /dev/md1 /dev/ggate1

then wrote some MB to the device, and it would quickly stall like so

igor# zpool iostat tank 2
...
tank         128M  7.81G      0     10      0  1.31M
tank         128M  7.81G      0     10      0  1.31M
tank         128M  7.81G      0     10      0  1.31M
tank         128M  7.81G      0     10      0  1.31M
tank         128M  7.81G      0     10      0  1.31M
tank         128M  7.81G      0     10      0  1.31M
tank         128M  7.81G      6      9   892K  1.24M
tank         128M  7.81G     29      0  3.67M  63.7K
tank         128M  7.81G      0      0      0      0
tank         128M  7.81G      0      0  63.7K      0
tank         128M  7.81G      0      0      0      0
tank         128M  7.81G      0      0  63.7K      0
tank         128M  7.81G      0      0      0      0

I then became impatient and destroyed the ggate1 device, which was
working fine while there was no operation in progress. I guess GEOM does
not like it when devices disappear nilly willy? (But then, how is
gmirror(8) supposed to give you redundancy? Or is this something very
special with ggate(8)?

igor# ggatec destroy -f -u1

Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0x2c
fault code              = supervisor read, page not present
instruction pointer     = 0x20:0xc0536f56
stack pointer           = 0x28:0xce6d7c58
frame pointer           = 0x28:0xce6d7c78
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 3 (g_up)
[thread pid 3 tid 100008 ]
Stopped at      _mtx_lock_flags+0x46:   movl    0x10(%ebx),%eax
db> where
Tracing pid 3 tid 100008 td 0xc1fbd000
_mtx_lock_flags(1c,0,c25e4c49,1d8,c251e240,...) at _mtx_lock_flags+0x46
vdev_geom_io_intr(c251e240,0,c07b4939,bbc,c251e240,...) at vdev_geom_io_intr+0x44
biodone(c251e240,c08306c8,24c,c07a291a,a,...) at biodone+0x99
g_io_schedule_up(c1fbd000,0,c07a4153,5d,0,...) at g_io_schedule_up+0xd7
g_up_procbody(0,ce6d7d38,c07a7c5e,30c,c1fba290,...) at g_up_procbody+0x98
fork_exit(c04f3f00,0,ce6d7d38) at fork_exit+0xc5
fork_trampoline() at fork_trampoline+0x8
--- trap 0, eip = 0, esp = 0xce6d7d70, ebp = 0 ---
db> show alllocks
Process 1257 (dd) thread 0xc26dc440 (100107)
exclusive lockmgr zfs r = 0 (0xc270f5c8) locked _at_ /vol/src/sys/kern/vfs_vnops.c:591
Process 1222 (ggatec) thread 0xc2717000 (100093)
exclusive sx so_rcv_sx r = 0 (0xc24eae2c) locked _at_ /vol/src/sys/kern/uipc_sockbuf.c:148
Process 25 (syncer) thread 0xc2163000 (100039)
shared lockmgr vfslock r = 0 (0xc21728b8) locked _at_ /vol/src/sys/kern/vfs_subr.c:364
exclusive lockmgr syncer r = 0 (0xc270f724) locked _at_ /vol/src/sys/kern/vfs_subr.c:1667
Process 2 (g_event) thread 0xc1fbd220 (100007)
exclusive sx GEOM topology r = 0 (0xc083072c) locked _at_ /vol/src/sys/geom/geom_event.c:185
db> trace 1257
Tracing pid 1257 tid 100107 td 0xc26dc440
sched_switch(c26dc440,0,1,176,ad3d77b9,...) at sched_switch+0x329
mi_switch(1,0,c07aedd4,1ca,0,...) at mi_switch+0x215
sleepq_switch(c26dc440,0,c07aedd4,239,c26dc440,...) at sleepq_switch+0x14d
sleepq_wait(c2722b24,0,c25e27c0,1,0,...) at sleepq_wait+0x63
_cv_wait(c2722b24,c2722aac,c25e26cb,19e,c2722b1c,...) at _cv_wait+0x210
txg_wait_open(c2722a00,1f,0,0,21a0000,...) at txg_wait_open+0xb3
dmu_tx_wait(c2547b00,2,0,21a0000,0,...) at dmu_tx_wait+0xed
zfs_freebsd_write(cf3e2bc4,c07d6450,0,0,cf3e2b3c,...) at zfs_freebsd_write+0x313
VOP_WRITE_APV(c25e6840,cf3e2bc4,c07b79c9,24f,0,...) at VOP_WRITE_APV+0x155
vn_write(c234c208,cf3e2c60,c2520500,0,c26dc440,...) at vn_write+0x1c4
dofilewrite(cf3e2c60,ffffffff,ffffffff,0,c234c208,...) at dofilewrite+0x95
kern_writev(c26dc440,4,cf3e2c60,82a0000,60000,...) at kern_writev+0x58
write(c26dc440,cf3e2cfc,c,c0795ff0,c07ebbc0,...) at write+0x4f
syscall(cf3e2d38) at syscall+0x2e3
Xint0x80_syscall() at Xint0x80_syscall+0x20
--- syscall (4, FreeBSD ELF32, write), eip = 0x2815ef13, esp = 0xbfbfebdc, ebp = 0xbfbfec08 ---
db> trace 2
Tracing pid 2 tid 100007 td 0xc1fbd220
sched_switch(c1fbd220,0,1,176,3d3f552e,...) at sched_switch+0x329
mi_switch(1,0,c07aedd4,1ca,0,...) at mi_switch+0x215
sleepq_switch(c1fbd220,0,c07aedd4,239,0,...) at sleepq_switch+0x14d
sleepq_wait(c3d1b304,0,c25e4c9a,0,0,...) at sleepq_wait+0x63
_sleep(c3d1b304,c3d1b31c,0,c25e4c9a,0,...) at _sleep+0x335
vdev_geom_release(c276ee40,ffffffff,ffffffff,ffffffff,c2098980,...) at vdev_geom_release+0x81
vdev_geom_orphan(c276ee40,c07a3c70,c25469d8,90,6,...) at vdev_geom_orphan+0x1cd
g_run_events(c0830760,0,4c,c07a291a,a,...) at g_run_events+0x1f9
g_event_procbody(0,ce6d4d38,c07a7c5e,30c,c1fba520,...) at g_event_procbody+0x95
fork_exit(c04f3fa0,0,ce6d4d38) at fork_exit+0xc5
fork_trampoline() at fork_trampoline+0x8
--- trap 0, eip = 0, esp = 0xce6d4d70, ebp = 0 ---

Am I nuts in trying something like this, or is this just a genuine bug?

Cheers,
Ulrich Spoerlein
-- 
It is better to remain silent and be thought a fool,
than to speak, and remove all doubt.