Re: CURRENT r250636: ZFS pool destroyed while scrubbing in action and shutdown

From: Xin Li <delphij_at_delphij.net> Date: Wed, 15 May 2013 14:01:02 -0700 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:37 UTC

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 5/15/13 12:17 PM, O. Hartmann wrote:
> On Wed, 2013-05-15 at 10:39 -0700, Xin Li wrote:
>> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
>> 
>> On 05/15/13 10:20, O. Hartmann wrote:
>>> Several machines running FreeBSD 10.0-CURRENT #0 r250636: Tue
>>> May 14 21:13:19 CEST 2013 amd64 were scrubbing the pools over
>>> the past two days. Since that takes a while, I was sure I could
>>> shutdown the boxes and scrubbing will restart next restart
>>> automatically.
>>> 
>>> Not this time! On ALL(!) systems (three) the pools remains 
>>> destroyed/corrupted showing this message(s) (as a
>>> representative, I will present only one):
>> 
>> Have you tried to import the pool with '-f -F -X', i.e.:
>> 
>> zpool import -f -F -X ASGARD00 ?
>> 
>> Cheers, - -- Xin LI <delphij_at_delphij.net>
>> https://www.delphij.net/ FreeBSD - The Power to Serve!
>> Live free or die
> 
> 
> All right, I had first to export the pool before i could import it
> again. After import, the scrubbing goes on and it seems all right
> so far.

Ok.

> The One-Disk-Pool, the other one that failed, seems to have
> different IDs since import compalins about multiple existences of a
> pool with the very same name. How can this happen?
> 
> root_at_b211:/root # zpool import pool: BACKUP00 id:
> 257822624560506537 state: FAULTED status: The pool metadata is
> corrupted. action: The pool cannot be imported due to damaged
> devices or data. The pool may be active on another system, but can
> be imported using the '-f' flag. see:
> http://illumos.org/msg/ZFS-8000-72 config:
> 
> BACKUP00    FAULTED  corrupted data ada3p1    ONLINE
> 
> pool: BACKUP00 id: 9337833315545958689 state: FAULTED status: One
> or more devices contains corrupted data. action: The pool cannot be
> imported due to damaged devices or data. The pool may be active on
> another system, but can be imported using the '-f' flag. see:
> http://illumos.org/msg/ZFS-8000-5E config:
> 
> BACKUP00               FAULTED  corrupted data 8544670861382329237
> UNAVAIL  corrupted data
> 

I don't really know if I don't have further information but I'll give
a guess: one possibility is that you created two pools with different
partition scheme (e.g. created a pool on whole disk, didn't wiped the
ends, then created a pool inside a partition), you need to be very
careful at this point.

If the first one looks sane (which looks like to be the right pool),
you can import the pool with:

zpool import -f 25782262456050653* BACKUP00

(I've replaced the 7 with * so you don't blindly follow the operation
- -- double check if the pool looks right).

If that won't work, the next step I would try would be import -f -F -X
with the same numerical id.

To prevent confusion, you could replace 'BACKUP00' with something like
'NEWBACKUP00', which will rename the pool when it's imported.

THe reason why the system thinks the pool may be active on a different
system might be that you are in single user mode and didn't run
/etc/rc.d/hostid start, or the system's hostid was changed for some
reason.

Hope this helps.

Cheers,
-----BEGIN PGP SIGNATURE-----

iQEcBAEBCAAGBQJRk/eOAAoJEG80Jeu8UPuzx28H/jfuUmEBkkA4D60m3UjZwwNP
luQ2Hpm2dCYFLAG8vEl5Q/nArFFw+EUwJPde9QZdgddap8U7TE3ENyWauQ98/1az
m6nZhdBU4WAgw2HzkBkk7kfgNH7twDSTbT9LGYe5p8wImBZOlYphFp48DvZOTXoA
51A95JgCYrEfkXTUCUTpAk+YHVrbgJS2+uLUteHLmzDH8UUK8qojrpG0H54NHzxa
/2dbGcyRWs2eoulJyOCv7bTZWZ9RcZhMnWmReE8UFHyCjFcxwWBnXQeaO2rtEgX0
DYQpWOegp6XhDeqb3t8UfISzrttRVf1unD/dCNVhuz/uwo3/zuDffdIfLxtv0iE=
=/Hl0
-----END PGP SIGNATURE-----