Re: ZFS buggy in CURRENT? Stuck in [zio->io_cv] forever!

From: O. Hartmann <ohartman_at_zedat.fu-berlin.de> Date: Sun, 27 Oct 2013 16:10:26 +0100 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:43 UTC

On Sun, 27 Oct 2013 13:40:39 +0100
"O. Hartmann" <ohartman_at_zedat.fu-berlin.de> wrote:

> 
> I have setup a RAIDZ pool comprised from 4 3TB HDDs. To maintain 4k
> block alignment, I followed the instructions given on several sites
> and I'll sketch them here for the protocol.
> 
> The operating system is 11.0-CURRENT AND 10.0-BETA2.
> 
> create a GPT partition on each drive and add one whole-covering
> partition with the option
> 
> gpart add -t freebsd-zfs -b 1M -l disk0[0-3] ada[3-6]
> 
> gnop create -S4096 gtp/disk[3-6]
> 
> Because I added a disk to an existing RAIDZ, I exported the former
> ZFS pool, then I deleted on each disk the partition and then destroyed
> the GPT scheme. The former pool had a ZIL and CACHE residing on the
> same SSD, partioned. I didn't kill or destroy the partitions on that
> SSD. To align 4k blocks, I also created on the existing gpt/log00 and
> gpt/cache00 via 
> 
> gnop create -S4096 gpt/log00|gpt/cache00
> 
> the NOP overlays.
> 
> After I created a new pool via zpool create POOL gpt/disk0[0-3].nop
> log gpt/log00.nop cache gpt/cache00.nop

It is, of course, a "zpool create POOL raidz ..."


> 
> I "received" a snapshot taken and sent to another storage array, after
> I the newly created pool didn't show up any signs of illness or
> corruption.
> 
> After ~10 hours of receiving the backup, I exported that pool amongst
> the backup pool, destroyed the appropriate .nop device entries via 
> 
> gnop destroy gpt/disk0[0-3]
> 
> and the same for cache and log and tried to check via 
> 
> zpool import
> 
> whether my pool (as well as the backup pool) shows up. And here the
> nasty mess starts!
> 
> The "zpool import" command issued on console is now stuck for hours
> and can not be interrupted via Ctrl-C! No pool shows up! Hitting
> Ctrl-T shows a state like
> 
> ... cmd: zpool 4317 [zio->io_cv]: 7345.34r 0.00 [...]
> 
> Looking with 
> 
> systat -vm 1
> 
> at the trhoughput of the CAM devices I realise that two of the four
> RAIDZ-comprising drives show activities, having 7000 - 8000 tps and ~
> 30 MB/s bandwidth - the other two zero!
> 
> And the pool is still inactive, the console is stuck.
> 
> Well, this made my day! At this point, I try to understand what's
> going wrong and try to recall what I did the last time different when
> the same procedure on three disks on the same hardware worked for me.
> 
> Now after 10 hours copy orgy and the need for the working array I
> start believing that using ZFS is still peppered with too many
> development-like flaws rendering it risky on FreeBSD. Colleagues
> working on SOLARIS on ZFS I consulted never saw those stuck-behaviour
> like I realise this moment.
> 
> I don not want to repeat the procedure again. There must be a
> possibility to import the pool - even the backup pool, which is
> working, untouched by the work, should be able to import - but it
> doesn't. If I address that pool, while this crap "zpool import"
> command is still blocking the console, not willing to die even with
> "killall -9 zpool", I can not import the backup pool via "zpool
> import BACKUP00". The console gets stuck immediately and for the
> eternity without any notice. Htting Ctrl-T says something like 
> 
> load: 3.59  cmd: zpool 46199 [spa_namespace_lock] 839.18r 0.00u 0.00s
> 0% 3036k
> 
> which means I can not even import the backup facility and this means
> really no fun.