Re: Repeatable kernel panic on -CURRENT using ZFS over SATA

From: Steven Schlansker <stevenschlansker_at_berkeley.edu>
Date: Tue, 02 Oct 2007 11:10:17 -0700
Pawel Jakub Dawidek wrote:
> On Tue, Oct 02, 2007 at 01:17:00AM -0700, Steven Schlansker wrote:
>> Hello everyone,
>> I recently set up a 6 drive SATA raidz2.  Whenever I try to use the 
>> array, the dmesg fills up with warnings that WRITE_DMA must be retried 
>> (repeatedly)
>>
>> As soon as I remove the load, everything runs fine.
>>
>> Dmesg with errors here:
>> http://soda.csua.berkeley.edu/~steven/dmesg.txt
>>
>> The eventual end result:
>> http://soda.csua.berkeley.edu/~steven/Image053.jpg
>>
>>
>> The only references I can find to similar problems were either not 
>> resolved, or seemed to be related to a chipset which I am not using.
>>
>> Is this a known issue?  How can I make this machine stable?  Is there 
>> any more information I can provide to aid debugging?  Thanks so very much,
> 
> This looks like a problem a couple of folks already reported. For me it
> looks like ATA bug, as if I recall correctly various controllers from
> various vendors are affected. Unfortunately Soren isn't very active
> lately. As a work-around you may try disabling write cache on your
> disks (hw.ata.wc=0 to /boot/loader.conf), but this may only help to
> mitigate the problem.
> 

I tried disabling the write cache, however that didn't do much.  I think 
the frequency of the WRITE_DMA timeouts decreased, but they are 
definitely still happening.  Are there any other things I can try?  I'd 
really like to get this working, as I just spent a thousand dollars on 
all this equipment, and to find out it can't stay online for more than a 
few minutes is quite saddening...

I can try to help debug the problem if someone will guide me along - the 
system is a production system but nobody will know if it crashes a few 
times, so I'm perfectly willing to try things and panic it or whatever. 
  I'd like to help quash the bug, but I do not have the kernel knowledge 
to do it myself, only the hardware that causes it :)

Any other suggestions are also welcome.
Thanks,
Steven
Received on Tue Oct 02 2007 - 16:10:18 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:18 UTC