Possible zpool online, resilvering issue

From: Ultima <ultima1252_at_gmail.com>
Date: Thu, 4 Aug 2016 01:22:51 -0400
Hello,

I recently had some issue with a PSU and ran several scrubs on a pool with
around 35T. Random drives would drop and require a zpool online, this found
checksum errors. (as expected) However, after all the scrubs I ran, I think
I may have found a bug with zpool online resilvering process.

24 disks total, 4 vdevs raidz2 (6 drives each).

Before this next part... I had a backup PSU, however it was also going bad
and waiting for RMA. The current one seemed to be dieing but ran fine with
less drives. So I decided I would run the server short 4 drives.

Started by offline(or already removed from psu) 4 drives from different
vdevs, then ran a scrub to verify everything. Many sum errors were present
on some of the drives, but this was expected due to faulty psu. Then
offlined 4 different drives and onlined the other 4 and scrubbed once
again. After resilver, again, many sum errors on these drives as expected.

After the scrub completed, I decided to offline 4 different drives, then
online the ones that were out of pool for awhile. During the resilver,
checksum errors were once again found. I was surprised due to the recent
scrub, So I decided to run another scrub, and it found even more checksum
errors on these recently onlined drives. I didn't think much about it,
however after the replacement PSU arrived, I onlined all the drives out of
pool and again, resilver had checksum errors as well as another scrub with
more sum errors.

Is this issue known? Is it common for a scrub to be required after onlining
a disk that was out of pool for some time?

The drives are ST4000NM0033, and until recent have never had a single
checksum error in they're lifetime.(at least with zfs)
FreeBSD S1 12.0-CURRENT FreeBSD 12.0-CURRENT #19 r303224: Sat Jul 23
10:41:12 EDT 2016
root_at_S1:/usr/src/head/obj/usr/src/head/src/sys/MYKERNEL-NODEBUG
 amd64


Sorry for the wall of text, but I hope this helps in tracking down this
possible bug.

Ultima
Received on Thu Aug 04 2016 - 03:22:53 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:07 UTC