Re: ZFS pool corrupted on upgrade of -current (probably sata renaming)

From: Richard Todd <rmtodd_at_ichotolot.servalan.com>
Date: Thu, 16 Jul 2009 23:08:57 -0500
Louis Mamakos <louie_at_transsys.com> writes:

> On Wed, Jul 15, 2009 at 03:19:30PM -0700, Freddie Cash wrote:
>> 
>> Hrm, you might need to do this from single-user mode, without the ZFS
>> filesystems mounted, or the drives in use.  Or from a LiveFS CD, if /usr is
>> a ZFS filesystem.
>> 
>> On our ZFS hosts, / and /usr are on UFS (gmirror).
>
> I don't understand why you'd expect you could take an existing
> container on a disk, like a FreeBSD slice with some sort of live data
> within it, and just decide you're going to take a way one or more
> blocks at the end to create a new container within it?

Well, technically, I don't think they were recommmending taking the slice
with live data on it and labeling it, but instead detaching that slice from
the mirror, labeling it, and reattaching it, causing zfs to rewrite all
the data to that half of the mirror.  It turns out that trying to reattach
a 1-sector-shorter chunk of disk will still usually work.

> If you look at page 7 of the ZFS on-disk format document that was
> recently mentioned, you'll see that ZFS stores 4 copies of it's "Vdev
> label"; two at the front of the physical vdev and two at the end of
> the Vdev, each of them apparently 256kb in length.  That's assuming
> that ZFS doens't round down the size of the Vdev to some convienient
> boundary.  It is going to get upset that the Vdev just shrunk out from
> under it?

I've been investigating this a bit (testing the glabel procedure on
some mdconfig'ed disks to see that it does indeed work, and reading
the zfs source.)  Turns out that ZFS *does* internally round down the
size of each device to the next multiple of sizeof(vdef_label_t), at
this line of vdev.c:

        osize = P2ALIGN(osize, (uint64_t)sizeof (vdev_label_t));

vdev_label_t is 256K long.  So as long as your partitions are not an
*exact* multiple of 256K, you should be able to freely detach,
label, and reattach them.  If they *are* an exact multiple of 256K,
the procedure should fail on the "reattach" step, so you'll know you
won't be able to proceed and would have to un-label the disk chunk
and put things back as before.  See below:

Script started on Thu Jul 16 23:03:11 2009
You have mail.
blo-rakane# diskinfo -v /dev/md2s1a /dev/md3s1a
/dev/md2s1a
	512         	# sectorsize
	517996544   	# mediasize in bytes (494M)
	1011712     	# mediasize in sectors
	1003        	# Cylinders according to firmware.
	16          	# Heads according to firmware.
	63          	# Sectors according to firmware.

/dev/md3s1a
	512         	# sectorsize
	517996544   	# mediasize in bytes (494M)
	1011712     	# mediasize in sectors
	1003        	# Cylinders according to firmware.
	16          	# Heads according to firmware.
	63          	# Sectors according to firmware.

blo-rakane# zpool create test mirror md2s1a md3s1a
blo-rakane# zpool status -v test
  pool: test
 state: ONLINE
 scrub: none requested
config:

	NAME        STATE     READ WRITE CKSUM
	test        ONLINE       0     0     0
	  mirror    ONLINE       0     0     0
	    md2s1a  ONLINE       0     0     0
	    md3s1a  ONLINE       0     0     0

errors: No known data errors
blo-rakane# zpool detach test md3s1a
blo-rakane# glabel label -v testd3 /dev/md3s1a
Metadata value stored on /dev/md3s1a.
Done.
blo-rakane# zpool attach test md2s1a /dev/label/testd3
cannot attach /dev/label/testd3 to md2s1a: device is too small
blo-rakane# exit
exit

Script done on Thu Jul 16 23:07:13 2009
Received on Fri Jul 17 2009 - 04:14:53 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:52 UTC