ZFS pool corrupted on upgrade of -current (probably sata renaming)

From: Chris Hedley <freebsd-current_at_chrishedley.com>
Date: Tue, 14 Jul 2009 23:39:47 +0100 (BST)
[A short summary in advance of my rambling: it seems that my ZFS pool got 
upset with the sata drive IDs changing and nearly broke.  I /assume/ this 
hasn't been discussed, I did look, but please accept my apologies in 
advance if it's already a known issue]

I sent a rather panicked message about this yesterday; fortunately I sent 
it to the wrong address so I'll send a slightly more sober version of the 
same today. :)

I experienced a rather worrying problem when updating from my c. Feb 2009 
version of -current to a recent build in that my ZFS pool was quite badly 
affected.  Fortunately it hasn't /actually/ lost any data (yet) but I 
think I've been lucky in that regard and I do feel like the Sword of 
Damocles is hanging over me until I've moved it somewhere safe(r).

In more detail, I had a raidz2 pool spread across eight of my 10 sata 
discs, using the same "h" partition of the BSD table I'd installed in 
"dangerously dedicated" mode.  This had been working fine since the 
outset, also surviving the ZFS update around the beginning of the year 
with no problems.

This time, however, things got extremely hairy: two of the component discs 
disappeared altogether, ad12 and ad22 in the new parlance, which would 
appear to be ad4 and ad6 in the old.  This is perhaps significant as the 
two discs using the names ad4 and ad6 in the new nomenclature, formerly 
ad1 and ad2 respectively, were also reporting IO errors--I thought I'd had 
it as there's no way a raidz2 can survive four disc failures, but perhaps 
significantly ad4 and ad6 are the two drive names shared between the old 
and the new numbering schemes--as mentioned, the "missing" discs, ad12 and 
ad22 being the "old" ad4 and ad6; I'm probably explaining this badly, so 
here's a table of the old and new names:

disc	old	new
----	---	---
disc 1:	ad0	ad4	- IO errors on "new" ad4
disc 2: ad1	ad6	- IO errors on "new" ad6
disc 3:	ad2	ad8
disc 4: ad3	ad10
disc 5: ad4	ad12	- "old" ad4 (now ad12) removed from pool
disc 6:	ad5	ad20
disc 7:	ad6	ad22	- "old" ad6 (now ad22) removed from pool
disc 8: ad7	ad24

In writing this down I think I can see clearly what the problem was, 
though I've been unable to find any mention of how to get ZFS to adapt to 
the drive names changing (maybe it's more obvious to ZFS veterans, but I'm 
not one of them!)

At present I'm moving my data off the ZFS array before it totally confuses 
itself and eats my stuff, and enjoying the feeling of being rather cold 
and clammy while my data's on non-redundant drives for the first time in 
years.  I'll probably use a couple of big and simple gmirror arrays in the 
short term but I'd like to rebuild my ZFS pool without worrying about the 
same thing happening again; could anyone offer suggestions, or perhaps 
make ZFS a bit less dependent on FreeBSD's idea of what a disc is called, 
or at least point me at something I should've read that might have avoided 
all this stuff happening in the first place...?

Thanks,

Chris.
Received on Tue Jul 14 2009 - 20:39:55 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:51 UTC