Re: zpool: multiple IDs, CURRENT drops all pools after reboot

From: O. Hartmann <ohartman_at_zedat.fu-berlin.de>
Date: Wed, 17 Sep 2014 00:34:33 +0200
Am Tue, 16 Sep 2014 22:06:36 +0100
"Steven Hartland" <killing_at_multiplay.co.uk> schrieb:

> > On of my backup drives dedicated to a ZPOOL is faulting and showing up multiple ID.
> > The only working ID is id: 257822624560506537.
> > 
> > FreeBSD CURRENT with three ZFS disks and only 4GB of RAM is very "flaky" regarding
> > this issue: today, tow times the whole poolset vanishes after a reboot. Giving the
> > box 8 GB total and rebooting doens't show the problem, it gets more frequent when
> > reducing the RAM to 4GB (FreeBSD 11.0-CURRENT #2 r271684: Tue Sep 16 20:41:47 CEST
> > 2014). This is a bit spooky.
> > 
> > Below the faulted harddrive. I guess the drive/pool below shown triggers somehow the
> > loss of all other pools (I have to import the other pools, which do not have any
> > defects, but they they drop out after a reboot and vanish).
> > 
> > Is there a way getting rid of the faulty IDs without destroying the pool?
> > 
> > Regards,
> > 
> > Oliver 
> > 
> >  root_at_thor: [/etc] zpool import
> >    pool: BACKUP00
> >      id: 9337833315545958689
> >   state: FAULTED
> >  status: One or more devices contains corrupted data.
> >  action: The pool cannot be imported due to damaged devices or data.
> >         The pool may be active on another system, but can be imported using
> >         the '-f' flag.
> >    see: http://illumos.org/msg/ZFS-8000-5E
> >  config:
> > 
> >         BACKUP00               FAULTED  corrupted data
> >           8544670861382329237  UNAVAIL  corrupted data
> > 
> >    pool: BACKUP00
> >      id: 257822624560506537
> >   state: ONLINE
> >  action: The pool can be imported using its name or numeric identifier.
> >  config:
> > 
> >         BACKUP00    ONLINE
> >           ada3p1    ONLINE
> > 
> 
> Might be a long shot but check out the patches on:
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=187594
> 
> Specifically:
> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=147070
> 
> And if that doesn't work:
> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=147286
> 
> The second has all the changes from the first with the addition
> of some changes which dynamically size the max dirty data.
> 
> These changes are in discussion and its likely the additions
> in the second patch aren't the right direction but they
> have been reported to show good improvements under high
> memory pressure for certain workloads, so would be interesting
> to see if they help with your problem.
> 
> All that said you shouldnt end up with corrupt data no matter
> what.
> 
> Are there any other symptoms? Has memory been checked for
> faults etc?
> 
>     Regards
>     Steve

The reason why my desktop has only 4 GB left is that I discovered memory corruption when
equipted with 8 GB - there occured a strange bit flip. I can not assure that by ripping
off 4 GB (2 times 2GB, it is an old C2D/P45 based box) the problem has gone. I susepct
a dying chipset - when overheated (at the moment BIOS shows 80 degrees Celsius), the
problem is more frequent.

But, besindes data corruption, with 4 GB left and 2 disks put together as a striped
JBOD with another disk as the backup device (the faulty one) is a pain in the ass since
the box starts swapping immediately when some action on the ZFS drives take place. The
plan is to keep that craveyward alive for the next 2 months until I can afford a new
system ;-)

But anyway, I'll try the patches.

Thanks,
Oliver  


Received on Tue Sep 16 2014 - 20:34:36 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:52 UTC