Re: zpool scrub errors on 3ware 9550SXU

From: Ian J Hart <ianjhart_at_ntlworld.com>
Date: Wed, 08 Jul 2009 12:24:17 +0100
Quoting Kip Macy <kmacy_at_freebsd.org>:

> Did you answer my question of whether or not this can be reproduced  
> on 7-STABLE?

Yes I did, but the threading is a little broken, sorry that's my fault.

To reiterate, with 7 stable circa Jun 25th scrubs complete okay on the  
exact same hardware and v6 zpool as fails under 8.0-BETA1.

I'm scrubbing under 7 every time a run under 8 fails.

A reminder of the setup.

3ware 9550SXU-16
16x 1.5TB seagate. These drives throw bad sectors!

2 8 disk raidz2 vdevs combined into one pool.21.8TB.

Test file system with compression on copies 2

I don't think this is a zfs error as such, it looks like the card  
gives up, which then spawns a whole series of bogus checksum errors  
(but what do I know).

It's odd that it seems to take 40m+ to fail. Offsets are always large.

How can I test for/eliminate any LBA error?
What else might cause the card to fail (after 40m)?

BTW I have to put this into production soon, so I can start testing  
all the other stuff which might not work (ie samba).

Thanks for your help.

>
>
> -Kip
>
>
>
> On Tue, Jul 7, 2009 at 1:03 PM, Ian J Hart<ianjhart_at_ntlworld.com> wrote:
>> Quoting ianjhart_at_ntlworld.com:
>>
>>> Quoting ianjhart_at_ntlworld.com:
>>>
>>>> Quoting Kip Macy <kmacy_at_freebsd.org>:
>>>>
>>>>>>
>>>>>> As usual scrubs cleanly on 7.2. Started throwing errors within a few
>>>>>> minutes under 8. Then it paniced, possibly due to scrub -s.
>>>>>>
>>>>>> It's sat at the DB prompt if there's anything I can do. I'll need
>>>>>> idiots guide level instruction. I have a screen dump if someone  
>>>>>> want to step
>>>>>> up. Off list?
>>>>>>
>>>>>> Highlight seems to be...
>>>>>>
>>>>>> Memory modified after free 0xffffff0004da0c00(248) val=3000000 _at_
>>>>>> 0xffffff0004dc00
>>>>>> Panic: most recently used by none
>>>>>
>>>>> Can you test with recent 7-STABLE? That would tell me whether or not
>>>>> your hitting a general HEAD issues or problems with the v13 import.
>>>>
>>>> It's doing a scrub under 7.2 following another failed test. I'll pull it
>>>> up to stable after that.
>>>>
>>>> Have more data will post that once I've done a couple a jobs.
>>>>
>>>>>
>>>>> Thanks,
>>>>> Kip
>>>
>>> Here's that extra data.
>>>
>>> Updated 3ware/AMCC card firmware.
>>>
>>> Enable onboard SATA and fit a 300GB SATA disk. Remove the floppy and fit a
>>> second 300GB SATA disk.
>>>
>>> Remove the two 500GB disks and replace with 1.5TB units. I can now create
>>> two 8 disk raidz2 giving the same 12 disks worth of storage I had with one
>>> 14 disk raidz2.
>>>
>>> Reinstall the two O/S on the 300GB drives.
>>>
>>> <slight tangent>
>>> May be of use to someone, so bear with me.
>>>
>>> Reset to BIOS defaults. Some issues! Disabling sound helps.
>>>
>>> Now suspect motherboard BIOS may be part of the problem. Removed both
>>> cards and tested each version in turn.
>>>
>>> ref: http://www.tyan.com.tw/support_download_bios.aspx?model=S.S2895
>>>
>>> Started with 1.04 ended up with 1.04. Versions after, detect the internal;
>>> SATA disks as 150 not 300. Most versions lock the keyboard (KVM)  
>>> when legacy
>>> USB is enabled. That's a PITA when you've just taken the floopy disk out.No
>>> internal SATA disk settings. Be nice to check the geometry as 7 and 8
>>> sysinstall seem to be behaving differently.
>>>
>>> With the cards back in.
>>>
>>> Add an ATA disk and CDROM while testing.Easyboot order is SATA0 ATA0
>>> SATA1. Fdisk the so far blank ATA disk :)
>>>
>>> On board audio clashes with something. BIOS 1.03 and later supports 16
>>> SCSI boot devices. I disabled booting from the RAID card to allow the
>>> onboard SATA drives to boot.
>>>
>>> Out of space for option ROM error has gone.
>>>
>>> AFAIK CPUs are late enough to support DDR400. Check anyway. Clock down to
>>> 333Mhz. Still fails.
>>>
>>> </slight tangent>
>>>
>>> There's one last thing, this BIOS (1.04) does not supply the fix for AMD
>>> errata 169. Later BIOS incorrectly detect the onboard SATA disks.
>>>
>>> Northbridge System Request Queue may stall.
>>>
>>> ref:
>>> http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/25759.pdf
>>>
>>> We don't seem to  have /dev/msr. Could I fix this using (the shiny new)
>>> cpucontrol?
>>>
>>> Thanks
>>>
>>> ----------------------------------------------------------------
>>> This message was sent using IMP, the Internet Messaging Program.
>>>
>>>
>>> _______________________________________________
>>> freebsd-current_at_freebsd.org mailing list
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-current
>>> To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"
>>>
>>
>> FWIW this is still reproducable with 8.0-BETA1.
>>
>> --
>> ian j hart
>>
>> ----------------------------------------------------------------
>> This message was sent using IMP, the Internet Messaging Program.
>>
>>
>> _______________________________________________
>> freebsd-current_at_freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-current
>> To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"
>>
>
>
>
> --
> When bad men combine, the good must associate; else they will fall one
> by one, an unpitied sacrifice in a contemptible struggle.
>
>     Edmund Burke
> _______________________________________________
> freebsd-current_at_freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"
>



-- 
ian j hart

-- 
ian j hart

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.
Received on Wed Jul 08 2009 - 09:24:39 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:51 UTC