Re: Accessing bad hard drive causes panic

From: Pascal Jürgens <pascal.juergens_at_googlemail.com>
Date: Tue, 2 Oct 2007 12:13:07 +0200
Dan,

I had similar problems on a system with a cursed VIA chipset (K400).  
When I tried accessing a 5 disk RAIDZ from a PCI controller, it would  
give me dozens of errors and then fail on large files. So the problem  
you have might not necessarily be related to the disk.

Before sending a healthy HD back (and maybe losing your data), please  
check

- memory (memtest / ultimate boot cd)
- cables
- BIOS updates
- ide controllers (esp. pci-based additional controllers. My  
mainboard can stand none of them)
- did you check the console when the system was down? It might be not  
drive errors crashing your machine, but that too little memory (less  
than 1G) causing your kernel to panic with ZFS's memory demands  
(kmem_malloc: kmem_map too small, discussion here http:// 
kerneltrap.org/mailarchive/freebsd-current/2007/9/21/271557)

On my machine with 512M ram, the machine also hangs on heavy load  
after some time, despite tuning along the lines of http:// 
wiki.freebsd.org/ZFSTuningGuide, and I cannot under any circumstances  
get it to scrub without dying.

Hope this helps for further investigation,

Pascal Juergens


On 02.10.2007, at 09:58, freebsd-current-request_at_freebsd.org wrote:

> A few months ago I installed 7.0-CURRENT in order to migrate to  
> zfs.  At the
> time when I was copying files from a geo concat volume to the zfs  
> pool the
> server would freeze (unresponsive to pings).  I figured this was  
> the nature
> of CURRENT and moved on.  Yesterday I recompiled the kernel from  
> the latest
> source and the issue persists.
>
> The issue is that one of the old drives is experiencing a hardware  
> failure.
>  When ever it is accessed (from geo concat or added to the zfs pool  
> and
> scurbed) the server freezes, requiring a power cycle.
>
> I know that current isn't for the average user, which I am.  But I  
> figured I
> would report this and am willing to help diagnose the issue.
>
> The drive passes SMART selftests and returns healthy status, but has
> reported over 500 errors.  I am going to send the drive in for  
> replacement
> soon.
>
> Let me know what I can do to help.
>
> -Dan
>
>
> -- 
> Dan Borello
> dborello_at_uiuc.edu
> Structural Engineering Graduate Student
> University of Illinois - Urbana Champaign
> P: 847-877-6287
Received on Tue Oct 02 2007 - 08:39:43 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:18 UTC