Re: ZFS/zpool command blocks ... locking up all terminals

From: O. Hartmann <ohartman_at_zedat.fu-berlin.de>
Date: Fri, 20 Dec 2013 20:02:55 +0100
On Fri, 20 Dec 2013 11:23:25 -0700
Alan Somers <asomers_at_freebsd.org> wrote:

> On Fri, Dec 20, 2013 at 3:55 AM, O. Hartmann
> <ohartman_at_zedat.fu-berlin.de> wrote:
> >
> > I have a faulty pool with an ambiguous label and I tried to resolve
> > that problem. ZFS is at the moment highly active copying data from
> > several volumes to another.
> >
> > Operating system:
> >
> > 11.0-CURRENT FreeBSD 11.0-CURRENT #1 r259522: Tue Dec 17 19:02:10
> > CET 2013 amd64
> >
> > In one terminal I exported the pool in question and tried to list it
> > via "zpool import". But the this command sequence locks up the
> > terminal for an hour up!
> >
> > In another terminal I tried to issue to command "zpool status" to
> > watch the status of the pools (I have several). But this terminal
> > ist alos locked up right now!
> >
> > What is wrong here? I had such an issue in 10.0-CURRENT as well. It
> > seems ZFS is locking everything up and can only be brought back by a
> > hard reset! What is going on? Why is zpool locking up in trying to
> > display a label-scrambled pool while the zpool status is then also
> > locked up, but latter is supposed to show the status of the other,
> > healthy pools? This reminds me of single-threaded tools which looks
> > up every operation consecutively issued after the blocking command.
> >
> > How is this to be solved?
> 
> Sounds like a deadlock.  Did the "zpool export" complete successfully?

No, it didn't, it is now stuck for  ~ 8 hours.
As well as "zpool status".

>  Did the pool become suspended at any point?  Can you get to the

The pools not exported are under heavy load at the moment (two further
pools). The pool exported isn't to be checked - I can't check the
status since the command is blocking.

> kernel debugger?  Most importantly, can you reproduce it?  If you can,
> you'll probably need a WITNESS enabled kernel to get any useful info.

I regret, I have no debugging kernel on this machine. The question
regarding the fact whether the problem is reproducable is unanswered
since I have no chance at this moment to try the procedure under the
very same conditions. I once realised the same behaviour in
10.0-CURRENT three months ago. I do not recall the exact conditions.

What I do recall is, that after all operations on any pool has
finished, the "deadlock" released. At this moment, I try to copy ~ 4TB
data from a pool (RAIDZ-0) to an external drive (via USB 3.0, also a
ZFS pool). That takes hours and I suspect the deadlock will last that
long until the copying is finished.

But it is scaring, that a single faulty command can block all further
operations of ZFS/zpool even on different pools.

> When I find a deadlock, I usually go into the kernel debugger and
> issue the following commands.  It results in about a megabyte of
> output, so use screen or tmux or something to capture the output
> 
> x/s version
> show msginfo
> ps
> alltrace
> show alllocks  # You need witness for this one

I try this later after the backup is gone through. Thank you very much.

Oliver

> 
> -Alan
> 
> >
> > Oliver
> _______________________________________________
> freebsd-current_at_freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to
> "freebsd-current-unsubscribe_at_freebsd.org"



Received on Fri Dec 20 2013 - 18:03:04 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:45 UTC