Re: Boot from ZFS

From: Duncan Young <duncan.young_at_pobox.com> Date: Sat, 12 Jul 2008 18:37:11 +1000 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:32 UTC

I did use whole disks (the root disks (mirrored) are sliced into boot, swap, 
and root.

zpool status
  pool: big
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        big         ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            ad4     ONLINE       0     0     0
            ad6     ONLINE       0     0     0
            da0     ONLINE       0     0     0
            da1     ONLINE       0     0     0
            da2     ONLINE       0     0     0
            da3     ONLINE       0     0     0

errors: No known data errors

  pool: rootzfs
 state: ONLINE
 scrub: none requested
config:

        NAME                 STATE     READ WRITE CKSUM
        rootzfs              ONLINE       0     0     0
          mirror             ONLINE       0     0     0
            label/sysdisk_A  ONLINE       0     0     0
            label/sysdisk_B  ONLINE       0     0     0

errors: No known data errors

The zpool import -f didn't want to work.
The problem with scrubbing is that the pool needs to be online first (I 
think).  I couldn't import it, it just said that the metadata was corrupt.

The frustrating thing was that the problem wasn't from the disks, but from the 
controller.  Upon reboot all the disks were OK, but the meta data wasn't.  
Kind of frustrating.  If you're interested, from /var/log/messages:

root: ZFS: checksum mismatch, zpool=big path=/dev/da0 offset=469395789824 size=512
kernel: hptrr: start channel [0,2]
kernel: hptrr: channel [0,2] started successfully
kernel: hptrr: start channel [0,2]
kernel: hptrr: channel [0,2] started successfully
kernel: hptrr: start channel [0,0]
kernel: hptrr: start channel [0,2]
kernel: hptrr: channel [0,2] started successfully
kernel: hptrr: channel [0,0] started successfully
root: ZFS: checksum mismatch, zpool=big path=/dev/da0 offset=468971378176 size=512
root: ZFS: checksum mismatch, zpool=big path=/dev/da0 offset=468971382272 size=512
root: ZFS: checksum mismatch, zpool=big path=/dev/da0 offset=468971412480 size=512
kernel: hptrr: start channel [0,2]
kernel: hptrr: [0 2  ] failed to perform Soft Reset
kernel: hptrr: [0,2,0] device disconnected on channel
root: ZFS: vdev I/O failure, zpool=big path=/dev/da0 offset=92035899904 size=512 error=22
root: ZFS: vdev I/O failure, zpool=big path=/dev/da0 offset=277797371392 size=512 error=22
root: ZFS: vdev I/O failure, zpool=big path=/dev/da0 offset=91641628160 size=1536 error=22
root: ZFS: vdev I/O failure, zpool=big path=/dev/da0 offset=92035900928 size=4608 error=22
<100 lines snipped>
root: ZFS: vdev I/O failure, zpool=big path=/dev/da0 offset=92035913216 size=1024 error=22
root: ZFS: vdev I/O failure, zpool=big path=/dev/da0 offset=92035914240 size=512 error=22
root: ZFS: vdev I/O failure, zpool=big path=/dev/da0 offset=191856965120 size=512 error=22
root: ZFS: vdev I/O failure, zpool=big path=/dev/da0 offset=191856964608 size=512 error=22
root: ZFS: vdev I/O failure, zpool=big path=/dev/da0 offset=277797383680 size=1536 error=22
root: ZFS: vdev I/O failure, zpool=big path=/dev/da0 offset=277797384704 size=512 error=22
root: ZFS: vdev I/O failure, zpool=big path=/dev/da0 offset=277797383680 size=1024 error=22
root: ZFS: vdev I/O failure, zpool=big path=/dev/da0 offset=92035915264 size=512 error=22
kernel: (da0:hptrr0:0:0:0): Synchronize cache failed, status == 0x39, scsi status == 0x0
root: ZFS: vdev I/O failure, zpool=big path=/dev/da0 offset=92035914752 size=512 error=22
root: ZFS: vdev I/O failure, zpool=big path=/dev/da0 offset=92035915776 size=1024 error=22
root: ZFS: vdev I/O failure, zpool=big path=/dev/da0 offset=191856965632 size=2560 error=22
<90 lines snipped>
root: ZFS: vdev I/O failure, zpool=big path=/dev/da0 offset=277797380608 size=1024 error=22
root: ZFS: vdev I/O failure, zpool=big path=/dev/da0 offset=277797385216 size=512 error=22
root: ZFS: vdev I/O failure, zpool=big path=/dev/da0 offset=277797382656 size=512 error=22
root: ZFS: vdev I/O failure, zpool=big path=/dev/da0 offset=277797386240 size=1024 error=22
root: ZFS: vdev I/O failure, zpool=big path=/dev/da0 offset=455680 size=1024 error=22
root: ZFS: vdev I/O failure, zpool=big path=/dev/da0 offset=193536 size=1024 error=22
root: ZFS: vdev I/O failure, zpool=big path=/dev/da0 offset=500027814912 size=1024 error=22
root: ZFS: vdev I/O failure, zpool=big path=/dev/da0 offset=500028077056 size=1024 error=22
root: ZFS: vdev failure, zpool=big type=vdev.open_failed
kernel: hptrr: start channel [0,0]
kernel: hptrr: [0 0  ] failed to perform Soft Reset
kernel: hptrr: [0,0,0] device disconnected on channel
root: ZFS: vdev I/O failure, zpool=big path=/dev/da1 offset=56768110080 
size=512 error=22
syslogd: kernel boot file is /boot/kernel/kernel
kernel: Copyright (c) 1992-2008 The FreeBSD Project.
kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994

etc

regards

Duncan

On Sat, 12 Jul 2008 10:57:37 am Xin LI wrote:
> Duncan Young wrote:
> | Be carefull,  I've just had a 6 disk raidz array die.  Complete
>
> failure which
>
> | required restore from backup (the controler card which had access to 4
>
> of the
>
> | disks, lost one disk, then a second (at which point the machine
>
> paniced, Upon
>
> | reboot the raidz array was useless (Metadata corrupted)).  I'm also
>
> getting
>
> | reasonably frequent machine lockups (panics) in the zfs code.  I'm
>
> going to
>
> | start collecting crash dumps see if anyone can help in the next week
>
> or two.
>
> That's really unfortunate.  Some sort of automated disk monitoring stuff
> would be essential for RAID, this includes RAID-Z.  Did you used the
> whole disk dedicatedly for the pool, or (g)labeled before adding it into
> the zpool?  Did 'zpool import -f' help?
>
> | I guess what I'm trying to say is, that you can still lose everything
>
> on an
>
> | entire pool, so backups are still essential, an a couple of smaller
>
> pools is
>
> | probably preferable to one big pool (restore time is less).  zfs is
>
> not %100
>
> | (yet?).  The lack of any type of fsck still causes me concern.
>
> It's always true that backup is always important if data is valuable :)
> ~ The benefit having larger pool is that the administrator would have the
> ability to use larger disk space in one ZFS file system (which can not
> come cross zpool boundary), but it is recommended that when creating the
> zpool, we use smaller raid-z groups, e.g. don't use 48 disks within one
> raid-z group, a few disks (like 3-5) within one raid-z group would be fine.
>
> Regarding to fsck, 'zpool scrub' is pretty much like a fsck plus data
> integration check.  It would be, however, almost impossible to recover
> data if zpool is completely corrupt according to some Sun sources, but
> my experience with bad disks within raid-z did not turned me into a
> unrecoverable state (yet).
>
> Cheers,