Re: Apparently spurious ZFS CRC errors (was Re: ZFS data error without reasons)

From: Alexey Shuvaev <shuvaev_at_physik.uni-wuerzburg.de>
Date: Wed, 25 Mar 2009 22:12:47 +0100
On Wed, Mar 25, 2009 at 09:46:27PM +0100, army.of.root wrote:
> Alexey Shuvaev wrote:
>> On Wed, Mar 25, 2009 at 07:38:32PM +0100, Bernd Walter wrote:
>>> On Wed, Mar 25, 2009 at 06:04:08PM +0000, Mark Powell wrote:
>>>> On Wed, 25 Mar 2009, Bernd Walter wrote:
>>>>
>>>>> On Wed, Mar 25, 2009 at 03:21:28PM +0100, Alexander Leidinger wrote:
>>>>> I wouldn't be surprised if the problem is in the drive firmware.
>>>>> Preread and wc both have the potential to put a lot load to the drives
>>>>> and can trigger bugs that otherwise wouldn't matter.
>>>> I've emailed WD support for more info. Not expecting much though.
>>>> From reading other threads on these Green Power drives them seem 
>>>> rather crap. This is my model and firmware:
>>>>
>>>> http://www.datacent.com/datarecovery/hdd/western_digital/WD10EADS-00L5B1
>>>>
>>>> There's some head park problem too, but with 5s ZFS sync I don't 
>>>> think it applies in this case:
>>>>
>>>> http://www.silentpcreview.com/forums/viewtopic.php?t=51401&postdays=0&postorder=asc&start=120&sid=a1caf68d80ef8fecc5d9e86defde4c19
>>>> http://kerneltrap.org/mailarchive/linux-kernel/2008/4/9/1386304
>>>>
>>>>> I also have a system running WD drives and ECC RAM which show CRC errors
>>>> >from time to time, while all other systems have no CRC problem at all.
>>>>
>>>> Interesting. Are those CRC problems with WC on or off?
>>> WC is on, prefetch is off, but only because it had bad performance with
>>> MySQL.
>>> Drives are <WDC WD3200AAKS-00SBA0/12.01B01> Serial ATA II
>>> I don't know if it is with the drives, but other reasons are less
>>> likely in my opinion.
>>> The system is located in a data center and since I only get a few errors
>>> I decided to live with it and not to debug it further.
>>>
>> Hello!
>>
>> Me too...
>>
>> I don't use zfs, just ufs2 + soft updates, but I see sometimes rather
>> heavy data corruption (most often on / filesystem).
>> No kernel messages, I can shut down the system successfully just
>> to find the remnants of filesystems on the next boot.
>> It doesn't happen often, I think compiling ports in a jail + some
>> activity in the host increase the probability of a failure.
>>
>> The drive is:
>> ATA channel 3:
>>     Master:  ad6 <WDC WD5000AAKS-00C8A0/12.01C02> SATA revision 2.x
>>
>> hw.ata.wc=1 (default)
>>
>> FWIW,
>> Alexey.
>
> Hi :)
>
> Damn f**k ! - I just bought WD harddrives for my Workstation...
>
> is there any way to detect silent data corruption without ZFS ?
>
Not I'm aware of... I vaguely remember some Lock Order Reversal
appearing before the system goes to the hell (something with kn_list???)
but it may be unrelated.
Anyway if you see it, it is too late...
The first symptom is some applications are crashing with signal 11
and absolutely trashed backtraces. One time I have tried to break into
debugger in such a state and do immediate reboot but it didn't help,
disk seems to be already synced at that time...

Sometimes the system survives for a few weeks, sometimes only
for a few days.

So, backup, backup, backup...

Just my 0.02$,
Alexey.
Received on Wed Mar 25 2009 - 20:12:49 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:45 UTC