Using a kernel built from FreeBSD 11.0-CURRENT r271182 I got the following panic yesterday: [...] Unread portion of the kernel message buffer: [6880] panic: deadlkres: possible deadlock detected for 0xfffff80015289490, blocked for 1800503 ticks [6880] [6880] cpuid = 1 [6880] KDB: stack backtrace: [6880] db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe009432e9e0 [6880] kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe009432ea90 [6880] panic() at panic+0x1c1/frame 0xfffffe009432eb50 [6880] deadlkres() at deadlkres+0x588/frame 0xfffffe009432ebb0 [6880] fork_exit() at fork_exit+0x9a/frame 0xfffffe009432ebf0 [6880] fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe009432ebf0 [6880] --- trap 0, rip = 0, rsp = 0xfffffe009432ecb0, rbp = 0 --- [6880] KDB: enter: panic [6880] Uptime: 1h54m40s [6880] Dumping 354 out of 1973 MB:..5%..14%..23%..32%..41%..55%..64%..73%..82%..91% Reading symbols from /boot/kernel/zfs.ko.symbols...done. [...] Loaded symbols for /boot/kernel/profile.ko.symbols #0 doadump (textdump=1) at pcpu.h:219 219 pcpu.h: No such file or directory. in pcpu.h (kgdb) where #0 doadump (textdump=1) at pcpu.h:219 #1 0xffffffff80597bad in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:447 #2 0xffffffff80598100 in panic (fmt=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:746 #3 0xffffffff80539b98 in deadlkres () at /usr/src/sys/kern/kern_clock.c:240 #4 0xffffffff8055e8da in fork_exit (callout=0xffffffff80539610 <deadlkres>, arg=0x0, frame=0xfffffe009432ec00) at /usr/src/sys/kern/kern_fork.c:977 #5 0xffffffff8083fb5e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:605 #6 0x0000000000000000 in ?? () Current language: auto; currently minimal (kgdb) set $td=(struct thread *)0xfffff80015289490 (kgdb) tid $td->td_tid [Switching to thread 1184 (Thread 101428)]#0 sched_switch (td=0xfffff80015289490, newtd=<value optimized out>, flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1932 1932 cpuid = PCPU_GET(cpuid); (kgdb) bt #0 sched_switch (td=0xfffff80015289490, newtd=<value optimized out>, flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1932 #1 0xffffffff805a23c1 in mi_switch (flags=260, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:493 #2 0xffffffff805e4bca in sleepq_wait (wchan=0x0, pri=0) at /usr/src/sys/kern/subr_sleepqueue.c:631 #3 0xffffffff805a12b2 in _sx_xlock_hard (sx=0xfffff800062ed820, tid=18446735277971510416, opts=<value optimized out>, file=0x0, line=0) at /usr/src/sys/kern/kern_sx.c:676 #4 0xffffffff805a0add in _sx_xlock (sx=0x0, opts=0, file=<value optimized out>, line=0) at sx.h:154 #5 0xffffffff8114a691 in spa_get_errlog_size (spa=0xfffff800062ed000) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa_errlog.c:142 #6 0xffffffff8113f549 in spa_get_stats (name=0xfffffe0006126000 "spaceloop", config=0xfffffe00950e58e8, altroot=0xfffffe0006126430 "", buflen=2048) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:3287 #7 0xffffffff81189a45 in zfs_ioc_pool_stats (zc=0xfffffe0006126000) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:1656 #8 0xffffffff81187290 in zfsdev_ioctl (dev=<value optimized out>, zcmd=<value optimized out>, arg=<value optimized out>, flag=<value optimized out>, td=<value optimized out>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:6136 #9 0xffffffff80464a55 in devfs_ioctl_f (fp=0xfffff80017107f50, com=3222821381, data=0xfffff8004fb4b740, cred=<value optimized out>, td=0xfffff80015289490) at /usr/src/sys/fs/devfs/devfs_vnops.c:757 #10 0xffffffff805f3c3d in kern_ioctl (td=0xfffff80015289490, fd=<value optimized out>, com=0) at file.h:311 #11 0xffffffff805f381c in sys_ioctl (td=0xfffff80015289490, uap=0xfffffe00950e5b80) at /usr/src/sys/kern/sys_generic.c:702 #12 0xffffffff8085c2db in amd64_syscall (td=0xfffff80015289490, traced=0) at subr_syscall.c:133 #13 0xffffffff8083f90b in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:390 #14 0x00000008019fc3da in ?? () (kgdb) f 3 #3 0xffffffff805a12b2 in _sx_xlock_hard (sx=0xfffff800062ed820, tid=18446735277971510416, opts=<value optimized out>, file=0x0, line=0) at /usr/src/sys/kern/kern_sx.c:676 676 sleepq_wait(&sx->lock_object, 0); (kgdb) p sx->lock_object $14 = {lo_name = 0xffffffff81202163 "spa->spa_errlog_lock", lo_flags = 40960000, lo_data = 0, lo_witness = 0x0} This happened several minutes after a couple of zpool processes stopped responding while accessing information about the following pool: fk_at_r500 ~ $zpool status -v spaceloop pool: spaceloop state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://illumos.org/msg/ZFS-8000-9P scan: scrub repaired 2.98M in 16h27m with 0 errors on Sun Sep 7 12:25:36 2014 config: NAME STATE READ WRITE CKSUM spaceloop ONLINE 0 0 0 label/spaceloop.eli ONLINE 0 0 26 errors: No known data errors Dump time was 21:58:48, the vdev disappeared a while before that, therefore I believe the "possible" deadlock is a real one: Sep 6 21:18:09 r500 kernel: [4441] GEOM_ELI: g_eli_write_done() failed (error=5) label/spaceloop.eli[WRITE(offset=2235700736, length=16384)] Sep 6 21:18:09 r500 kernel: [4441] GEOM_ELI: g_eli_read_done() failed (error=5) label/spaceloop.eli[READ(offset=4008976384, length=8192)] Sep 6 21:18:09 r500 kernel: [4441] GEOM_ELI: g_eli_read_done() failed (error=5) label/spaceloop.eli[READ(offset=4009238528, length=8192)] Sep 6 21:18:09 r500 kernel: [4441] GEOM_ELI: g_eli_read_done() failed (error=5) label/spaceloop.eli[READ(offset=270336, length=8192)] Sep 6 21:18:27 r500 kernel: [4459] (da1:umass-sim1:1:0:0): WRITE(10). CDB: 2a 00 00 60 65 0c 00 00 40 00 Sep 6 21:18:27 r500 kernel: [4459] (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error Sep 6 21:18:27 r500 kernel: [4459] (da1:umass-sim1:1:0:0): Retrying command Sep 6 21:18:45 r500 kernel: [4477] (da1:umass-sim1:1:0:0): WRITE(10). CDB: 2a 00 00 60 65 0c 00 00 40 00 Sep 6 21:18:45 r500 kernel: [4477] (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error Sep 6 21:18:45 r500 kernel: [4477] (da1:umass-sim1:1:0:0): Retrying command Sep 6 21:19:03 r500 kernel: [4495] (da1:umass-sim1:1:0:0): WRITE(10). CDB: 2a 00 00 60 65 0c 00 00 40 00 Sep 6 21:19:03 r500 kernel: [4495] (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error Sep 6 21:19:03 r500 kernel: [4495] (da1:umass-sim1:1:0:0): Retrying command Sep 6 21:19:21 r500 kernel: [4514] (da1:umass-sim1:1:0:0): WRITE(10). CDB: 2a 00 00 60 65 0c 00 00 40 00 Sep 6 21:19:21 r500 kernel: [4514] (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error Sep 6 21:19:21 r500 kernel: [4514] (da1:umass-sim1:1:0:0): Retrying command Sep 6 21:19:40 r500 kernel: [4532] (da1:umass-sim1:1:0:0): WRITE(10). CDB: 2a 00 00 60 65 0c 00 00 40 00 Sep 6 21:19:40 r500 kernel: [4532] (da1:umass-sim1:1:0:0): CAM status: CCB request completed with an error Sep 6 21:19:40 r500 kernel: [4532] (da1:umass-sim1:1:0:0): Error 5, Retries exhausted Sep 6 21:19:40 r500 kernel: [4532] GEOM_ELI: g_eli_write_done() failed (error=5) label/spaceloop.eli[WRITE(offset=3234469888, length=32768)] [...] Sep 6 21:29:46 r500 kernel: [5138] GEOM_ELI: Device label/takems.eli destroyed. Sep 6 21:29:52 r500 kernel: [5144] ugen0.3: <SMI Corporation> at usbus0 (disconnected) Sep 6 21:29:52 r500 kernel: [5144] umass2: at uhub0, port 1, addr 3 (disconnected) Sep 6 21:29:52 r500 kernel: [5144] da2 at umass-sim2 bus 2 scbus4 target 0 lun 0 Sep 6 21:29:52 r500 kernel: [5144] da2: <takeMS USB Mini 1100> s/n AA04012900007508 detached Sep 6 21:29:52 r500 kernel: [5144] pass4 at umass-sim2 bus 2 scbus4 target 0 lun 0 Sep 6 21:29:52 r500 kernel: [5144] pass4: <takeMS USB Mini 1100> s/n AA04012900007508 detached Sep 6 21:29:52 r500 kernel: [5144] (pass4:umass-sim2:2:0:0): Periph destroyed Sep 6 21:29:52 r500 kernel: [5144] (da2:umass-sim2:2:0:0): Periph destroyed Sep 6 21:44:56 r500 kernel: [6049] umass1: at uhub1, port 1, addr 3 (disconnected) Sep 6 21:44:56 r500 kernel: [6049] da1 at umass-sim1 bus 1 scbus3 target 0 lun 0 Sep 6 21:44:56 r500 kernel: [6049] da1: <General USB Flash Disk 1.0> s/n 0000000000002511 detached Sep 6 21:44:56 r500 kernel: [6049] pass3 at umass-sim1 bus 1 scbus3 target 0 lun 0 Sep 6 21:44:56 r500 kernel: [6049] pass3: <General USB Flash Disk 1.0> s/n 0000000000002511 detached Sep 6 21:44:56 r500 kernel: [6049] (pass3:umass-sim1:1:0:0): Periph destroyed Sep 6 21:44:56 r500 kernel: [6049] GEOM_ELI: Device label/spaceloop.eli destroyed. Sep 6 21:44:56 r500 kernel: [6049] GEOM_ELI: Detached label/spaceloop.eli on last close. Sep 6 21:44:56 r500 kernel: [6049] (da1:umass-sim1:1:0:0): Periph destroyed Sep 6 21:44:57 r500 ZFS: vdev is removed, pool_guid=13593515477493551420 vdev_guid=2311923824790456307 Sep 6 21:44:57 r500 kernel: [6049] umass1: <General USB Flash Disk, class 0/0, rev 2.00/1.00, addr 3> on usbus1 Sep 6 21:44:57 r500 kernel: [6049] umass1: SCSI over Bulk-Only; quirks = 0x4101 Sep 6 21:44:57 r500 kernel: [6049] umass1:3:1: Attached to scbus3 Sep 6 21:46:10 r500 kernel: [6122] (probe0:umass-sim1:1:0:0): INQUIRY. CDB: 12 00 00 00 24 00 Sep 6 21:46:10 r500 kernel: [6122] (probe0:umass-sim1:1:0:0): CAM status: CCB request completed with an error Sep 6 21:46:10 r500 kernel: [6122] (probe0:umass-sim1:1:0:0): Retrying command Sep 6 21:46:45 r500 kernel: [6157] ugen1.3: <General> at usbus1 (disconnected) Sep 6 21:46:45 r500 kernel: [6157] umass1: at uhub1, port 1, addr 3 (disconnected) Sep 6 21:46:45 r500 kernel: [6157] (probe0:umass-sim1:1:0:0): INQUIRY. CDB: 12 00 00 00 24 00 Sep 6 21:46:45 r500 kernel: [6157] (probe0:umass-sim1:1:0:0): CAM status: CCB request completed with an error Sep 6 21:46:45 r500 kernel: [6157] (probe0:umass-sim1:1:0:0): Retrying command Sep 6 21:46:45 r500 kernel: [6157] (probe0:umass-sim1:1:0:0): INQUIRY. CDB: 12 00 00 00 24 00 Sep 6 21:46:45 r500 kernel: [6157] (probe0:umass-sim1:1:0:0): CAM status: CCB request completed with an error Sep 6 21:46:45 r500 kernel: [6157] (probe0:umass-sim1:1:0:0): Retrying command Sep 6 21:46:45 r500 kernel: [6157] (probe0:umass-sim1:1:0:0): INQUIRY. CDB: 12 00 00 00 24 00 Sep 6 21:46:45 r500 kernel: [6157] (probe0:umass-sim1:1:0:0): CAM status: CCB request completed with an error Sep 6 21:46:45 r500 kernel: [6157] (probe0:umass-sim1:1:0:0): Retrying command Sep 6 21:46:45 r500 kernel: [6157] (probe0:umass-sim1:1:0:0): INQUIRY. CDB: 12 00 00 00 24 00 Sep 6 21:46:45 r500 kernel: [6157] (probe0:umass-sim1:1:0:0): CAM status: CCB request completed with an error Sep 6 21:46:45 r500 kernel: [6157] (probe0:umass-sim1:1:0:0): Error 5, Retries exhausted Two other pools, one using USB as well, were unaffected until the panic was triggered. A pleasant surprise was that the system entered ddb from Xorg and dumping on /dev/ada0s1b worked despite /dev/ada0s1b.eli being configured as swap device. I wasn't aware that this was working now. Fabian
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:51 UTC