Re: ZFS-related panic: "possible" spa->spa_errlog_lock deadlock

From: Fabian Keil <freebsd-listen_at_fabiankeil.de>
Date: Sun, 7 Sep 2014 19:00:33 +0200
Xin Li <delphij_at_delphij.net> wrote:

> On 9/7/14 11:23 PM, Fabian Keil wrote:
> > Xin Li <delphij_at_delphij.net> wrote:
> > 
> >> On 9/7/14 9:02 PM, Fabian Keil wrote:
> >>> Using a kernel built from FreeBSD 11.0-CURRENT r271182 I got
> >>> the following panic yesterday:
> >>> 
> >>> [...] Unread portion of the kernel message buffer: [6880]
> >>> panic: deadlkres: possible deadlock detected for
> >>> 0xfffff80015289490, blocked for 1800503 ticks
> >> 
> >> Any chance to get all backtraces (e.g. thread apply all bt full
> >> 16)? I think a different thread that held the lock have been
> >> blocked, probably related to your disconnected vdev.
> > 
> > Output of "thread apply all bt full 16" is available at: 
> > http://www.fabiankeil.de/tmp/freebsd/kgdb-output-spa_errlog_lock-deadlock.txt
> >
> >  A lot of the backtraces prematurely end with "Cannot access memory
> > at address", therefore I also added "thread apply all bt" output.
> > 
> > Apparently there are at least two additional threads blocking below
> > spa_get_stats():
[...]
> Yes, thread 1182 owned the lock and is waiting for the zio be done.
> Other threads that wanted the lock would have to wait.
> 
> I don't have much clue why the system entered this state, however, as
> the operations should have errored out (the GELI device is gone on
> 21:44:56 based on your log, which suggests all references were closed)
> instead of waiting.

It occurred to me that I have relevant auth.log entries as well:

Sep  6 20:54:31 r500 sudo:       fk : TTY=pts/5 ; PWD=/home/fk ; USER=root ; COMMAND=/sbin/geli attach -j - /dev/label/spaceloop
Sep  6 20:54:44 r500 sudo:       fk : TTY=pts/5 ; PWD=/home/fk ; USER=root ; COMMAND=/sbin/geli attach -j - /dev/label/takems
Sep  6 20:54:51 r500 sudo:       fk : TTY=pts/5 ; PWD=/home/fk ; USER=root ; COMMAND=/sbin/zpool import -d /dev/label takems
Sep  6 20:55:30 r500 sudo:       fk : TTY=pts/5 ; PWD=/home/fk ; USER=root ; COMMAND=/sbin/zfs send -i _at_2014-08-13_23:10 tank/home/fk_at_2014-09-06_19:56
Sep  6 20:55:30 r500 sudo:       fk : TTY=pts/5 ; PWD=/home/fk ; USER=root ; COMMAND=/sbin/zfs receive -v -u -F spaceloop/backup/r500/tank/home/fk
Sep  6 20:55:46 r500 sudo:       fk : TTY=pts/5 ; PWD=/home/fk ; USER=root ; COMMAND=/sbin/zfs send -i _at_2014-08-13_23:10 tank/home/fk_at_2014-09-06_19:56
Sep  6 20:55:46 r500 sudo:       fk : TTY=pts/5 ; PWD=/home/fk ; USER=root ; COMMAND=/sbin/zfs receive -v -u -F spaceloop/backup/r500/tank/home/fk
[...]
Sep  6 21:28:47 r500 sudo:       fk : TTY=pts/6 ; PWD=/home/fk ; USER=root ; COMMAND=/sbin/zpool status spaceloop
Sep  6 21:29:43 r500 sudo:       fk : TTY=pts/9 ; PWD=/home/fk ; USER=root ; COMMAND=/sbin/zpool export takems
Sep  6 21:29:46 r500 sudo:       fk : TTY=pts/9 ; PWD=/home/fk ; USER=root ; COMMAND=/sbin/geli detach label/takems.eli
Sep  6 21:30:08 r500 sudo:       fk : TTY=pts/10 ; PWD=/home/fk ; USER=root ; COMMAND=/sbin/zpool clear spaceloop
Sep  6 21:44:16 r500 sudo:       fk : TTY=pts/11 ; PWD=/home/fk ; USER=root ; COMMAND=/usr/sbin/usbconfig
Sep  6 21:44:56 r500 sudo:       fk : TTY=pts/11 ; PWD=/home/fk ; USER=root ; COMMAND=/usr/sbin/usbconfig -d 1.3 reset
Sep  6 21:46:26 r500 sudo:       fk : TTY=pts/1 ; PWD=/home/fk ; USER=root ; COMMAND=/usr/sbin/usbconfig
Sep  6 22:03:33 r500 login: login on ttyv0 as fk

IIRC, I tried the USB reset because the "zfs receive ... spaceloop/*",
"zpool status spaceloop" and "zpool clear spaceloop" processes (and some
that weren't called with sudo and thus weren't logged) got stuck and while
it caused the kernel to close label/spaceloop.eli as intended, it did not
noticeable affect the deadlocked processes.

I don't remember exactly why the same ZFS stream was sent twice, but the most
likely explanation seems to be that I aborted the operation before it was done
and it's conceivable that this left the system in a state that caused the second
attempt to get stuck.

Fabian

Received on Sun Sep 07 2014 - 15:00:41 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:51 UTC