On Sun, 27 Oct 2013 16:32:13 -0000 "Steven Hartland" <killing_at_multiplay.co.uk> wrote: Hello all, after a third attempt, I realised that some remnant labels seem to cause the problem. Those labels didn't go away with "zpool create -f" or "zfs clearlabel provider", I had to issue "zfs destroy -F provider" to ensure that everything is cleared out. After the last unsuccessful attempt, I waited 14 hours for the "busy drives" as reported and they didn't stop doing something after that time, so I rebooted the box. Besides the confusion about how to proper use ZFS (I miss a documentation as a normal user and not being considered a core developer when using ZFS, several BLOGs have outdated data), there is still this issue with this nasty blocking of the whole system, only solveable by a hard reset. After the pool has been successfully created and after a snapshot has been received via -vdF option, a reimport of the pool wasn't possible as described below and any attempt to have pools listed for import (zfs import) ended up in a stuck console, uninteruptable by no kill or Ctrl-C. The demaged pool's drives showed some action, but even the pools considered unharmed didn't show up. This total-blockade also prevented the system from properly rebooting - a "shutdown -r" or "reboot" ended up in waiting for eternity after the last block has been synchronised - only power off or full reset could bring the box to life again. I think this is not intended and can be considered a bug? Thanks for the patience. oh > > ----- Original Message ----- > From: "O. Hartmann" <ohartman_at_zedat.fu-berlin.de> > > > > I have setup a RAIDZ pool comprised from 4 3TB HDDs. To maintain 4k > > block alignment, I followed the instructions given on several sites > > and I'll sketch them here for the protocol. > > > > The operating system is 11.0-CURRENT AND 10.0-BETA2. > > > > create a GPT partition on each drive and add one whole-covering > > partition with the option > > > > gpart add -t freebsd-zfs -b 1M -l disk0[0-3] ada[3-6] > > > > gnop create -S4096 gtp/disk[3-6] > > > > Because I added a disk to an existing RAIDZ, I exported the former > > ZFS pool, then I deleted on each disk the partition and then > > destroyed the GPT scheme. The former pool had a ZIL and CACHE > > residing on the same SSD, partioned. I didn't kill or destroy the > > partitions on that SSD. To align 4k blocks, I also created on the > > existing gpt/log00 and gpt/cache00 via > > > > gnop create -S4096 gpt/log00|gpt/cache00 > > > > the NOP overlays. > > > > After I created a new pool via zpool create POOL gpt/disk0[0-3].nop > > log gpt/log00.nop cache gpt/cache00.nop > > You don't need any of the nop hax in 10 or 11 any more as it has > proper sector size detection. The caviate for this is when you have a > disk which adervtises 512b sectors but is 4k and we dont have a 4k > quirk in the kernel for it yet. > > If you anyone comes across a case of this feel free to drop me the > details from camcontrol <identify|inquiry> <device> > > If due to this you still need to use the gnop hack then you only need > to apply it to 1 device as the zpool create uses the largest ashift > from the disks. > > I would then as the very first step export and import as at this point > there is much less data on the devices to scan through, not that > this should be needed but... > > > > I "received" a snapshot taken and sent to another storage array, > > after I the newly created pool didn't show up any signs of illness > > or corruption. > > > > After ~10 hours of receiving the backup, I exported that pool > > amongst the backup pool, destroyed the appropriate .nop device > > entries via > > > > gnop destroy gpt/disk0[0-3] > > > > and the same for cache and log and tried to check via > > > > zpool import > > > > whether my pool (as well as the backup pool) shows up. And here the > > nasty mess starts! > > > > The "zpool import" command issued on console is now stuck for hours > > and can not be interrupted via Ctrl-C! No pool shows up! Hitting > > Ctrl-T shows a state like > > > > ... cmd: zpool 4317 [zio->io_cv]: 7345.34r 0.00 [...] > > > > Looking with > > > > systat -vm 1 > > > > at the trhoughput of the CAM devices I realise that two of the four > > RAIDZ-comprising drives show activities, having 7000 - 8000 tps and > > ~ 30 MB/s bandwidth - the other two zero! > > > > And the pool is still inactive, the console is stuck. > > > > Well, this made my day! At this point, I try to understand what's > > going wrong and try to recall what I did the last time different > > when the same procedure on three disks on the same hardware worked > > for me. > > > > Now after 10 hours copy orgy and the need for the working array I > > start believing that using ZFS is still peppered with too many > > development-like flaws rendering it risky on FreeBSD. Colleagues > > working on SOLARIS on ZFS I consulted never saw those > > stuck-behaviour like I realise this moment. > > While we only run 8.3-RELEASE currently, as we've decided to skip 9.X > and move straight to 10 once we've tested, we've found ZFS is not only > very stable but it now become critical to the way we run things. > > > I don not want to repeat the procedure again. There must be a > > possibility to import the pool - even the backup pool, which is > > working, untouched by the work, should be able to import - but it > > doesn't. If I address that pool, while this crap "zpool import" > > command is still blocking the console, not willing to die even with > > "killall -9 zpool", I can not import the backup pool via "zpool > > import BACKUP00". The console gets stuck immediately and for the > > eternity without any notice. Htting Ctrl-T says something like > > > > load: 3.59 cmd: zpool 46199 [spa_namespace_lock] 839.18r 0.00u > > 0.00s 0% 3036k > > > > which means I can not even import the backup facility and this means > > really no fun. > > I'm not sure there's enough information here to determine where any > issue may lie, but as a guess it could be that ZFS is having issues > locating the one change devices and is scanning the entire disk to > try and determine that. This would explain the IO on the one device > but not the others. > > Did you per-chance have one of the disks in use for something else > and hence it may have old label information in it that wasn't cleaned > down? > > Regards > Steve > > ================================================ > This e.mail is private and confidential between Multiplay (UK) Ltd. > and the person or entity to whom it is addressed. In the event of > misdirection, the recipient is prohibited from using, copying, > printing or otherwise disseminating it or any information contained > in it. > > In the event of misdirection, illegible or incomplete transmission > please telephone +44 845 868 1337 or return the E.mail to > postmaster_at_multiplay.co.uk. > > _______________________________________________ > freebsd-current_at_freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to > "freebsd-current-unsubscribe_at_freebsd.org"
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:43 UTC