Re: zpool import hanging on unexpectedly-rebooted machine

From: Pawel Jakub Dawidek <pjd_at_FreeBSD.org>
Date: Wed, 20 Aug 2008 09:47:38 +0200
On Mon, Aug 18, 2008 at 04:26:39AM -0700, Colin Moller wrote:
> Hey all,
> 
> I've got an interestingly frustrating problem on my hands with our 
> 7.0-STABLE boxes running ZFS.  Sun X4500 box running amd64, 16GB of 
> RAM., 46x1TB disks in RAIDZ1. (other two for the OS.)
> 
> Uname for the box is:
> FreeBSD sf-nas1-c160a.storefront.com 7.0-STABLE FreeBSD 7.0-STABLE #1: 
> Sat May 31 14:54:22 PDT 2008     
> root_at_sf-nas1-c160a.storefront.com:/usr/obj/usr/src/sys/X4500  amd64
> 
> The box has been running relatively reliably for some months now, but 
> our hosting provider decided to reboot it on us without asking.  After 
> the box came back, it had lost /boot/zfs/zpool.cache, so I needed to 
> reimport the only zpool on the machine (named zfsdata).
> 
> Running zpool import gives me the output I'm expecting, showing a single 
> zpool called zfsdata, status of ONLINE, and all the disks are showing up.
> 
> However, when I run zpool import -f <numerical_pool_id>, the zpool 
> command simply hangs up with no disk and no CPU activity.  I've run 
> truss on the zpool import, and the last thing I see happening is:
> 
> open("/dev/ad96",O_RDONLY,030115000)             = 6 (0x6)
> ioctl(6,DIOCGIDENT,0xffff9480)                   = 0 (0x0)
> close(6)                                         = 0 (0x0)
> 
> After turning on vfs.zfs.debug, I also see this on the console:
> 
> zfs_ereport_post:293[1]: time=1219057172.795893475 ereport_version=0 
> class=fs.zfs.checksum zfs_scheme_version=0 pool=zfsdata 
> pool_guid=316648131406719055 pool_context=2 
> vdev_guid=7326417523786577584 vdev_type=disk vdev_path=/dev/ad12 
> vdev_devid=ad:GTF000PAHX5TMF parent_guid=6708978418893991394 
> parent_type=raidz zio_err=0 zio_offset=89290496000 zio_size=512 
> zio_object=132 zio_level=0 zio_blkid=244

if I read this correctly, it reports checksum error on disk /dev/ad12,
but because this is RAIDZ, it probably tries to self-heal and maybe
something here goes wrong. I never saw similar problem, so I'm not sure
how to help you. Even if upgrading to -CURRENT is not an option for you,
maybe you can still install -CURRENT on a USB pendriver and recompile it
with new patch? You may also try to remove this disk (ad12) and see if
it behaves any better. Anyway, please keep me informed on what's going
on.

-- 
Pawel Jakub Dawidek                       http://www.wheel.pl
pjd_at_FreeBSD.org                           http://www.FreeBSD.org
FreeBSD committer                         Am I Evil? Yes, I Am!

Received on Wed Aug 20 2008 - 05:47:33 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:34 UTC