Re: SATA controller testing update

From: 韓家標 Bill Hacker <askbill_at_conducive.net>
Date: Tue, 20 Nov 2007 13:11:36 +0000
Nathan Butcher wrote:
> Hi,
> 
> Just posting to say that Soren's upcoming patch to fix the Promise SATA
> controller issue seems to work fine for me. I imported and ran 4 drives
> of ZFS on the controller - and no more checksum issues. Tried running
> bonnie++ and had no probelms whatsoever. So far so good. I'll keep my
> ZFS pool on the card for a while in case anything pops up.
> 
> What I have noticed though, is that occasionally my root mounted system
> drive (which is on my JMB363 controller running AHCI, as /dev/ad6)
> occasionally gets randomly dismounted with the latest BETA3. It has
> happened twice now.
> This issue hasn't happened on any other drive on my system (despite
> there being 11 other drives on other controllers).
> 
> I have no idea how to reproduce this issue, and since it takes out my
> main system drive, I can't get any debugging info. All I see first is
> that the drive gets dismounted before the screen fills up and scrolls
> over with messages about missing nodes.
> 

I may have a way to reproduce at least a vaguely similar fault that we've just 
started looking at, but on ICH9, not Promise:

GigaByte GA G33-DS3R Core-2 Quad, 2 GB DDR-800, 2 X Toshiba 160 GB 2.5" SATA on 
IHC9 as GMIRROR RAID1 'split' gm0 taking in the entire device (ad0 and ad2).

With gm0 in good shape, a cp of a Qemu .img file

- from /dev/mirror/gm0s3d  ufs /pub

- to /dev/mirror/gm0s3e ufs /bak/backups

is rock-solid.

But an inadvertant 'mv' (technically illegal, as it crosses a mount-point) 
doesn't throw an error message.

Instead, it unaccountably causes GEOM to shed /dev/ad2 'instantly' from gm0.

Several hoops must be jumped thru to get it back, as /dev/ad2 thereafter reports 
as 'not attached'.

In addition to the usual GMIRROR commands to clean house and set up for a 
rebuild, I've had to set sysctl kern.geom.debugflags = 16, then do a newfs of 
the whole ad2 device, wiping out disklabel et al, then do what gmirror needs, 
re-insert, and let it rebuild. Which it does just fine.

I'm about to set up a 'better instrumented' test box to get more specifics, so 
nothing further here yet.

I report it only because there is SO little logging that first impression is 
that the trigger incident is below the GEOM layer.

This is with 7-BETA1 i386 of 20 October, testing to be started with 7-BETA3 of 
last night, but I've got to buy a couple of similar drives first.

Both the initially reporting MB and the test board have IHC9 *and* JMB363, so 
will try to reproduce on each controller with 7-BETA3 and 8-<head> before 
looking at patches.

More info as I get it.


Bill
Received on Tue Nov 20 2007 - 12:11:44 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:22 UTC