Re: Panic during install on Sparc64 - Only with large HDD

From: Chris Gilbert <Chris_at_LainOS.org>
Date: Sun, 14 Aug 2005 20:16:17 +0200
Also, it seems that setting hw.ata.ata_dma=0 (forcing it into PIO mode) fixes 
the issue.

# sysctl -a hw.ata.ata_dma
hw.ata.ata_dma: 0

# dd count=1 obs=1024 seek=93321656 if=/dev/urandom of=/dev/ad0g
1+0 records in
0+1 records out
512 bytes transferred in 0.001390 secs (368351 bytes/sec)

Also, seems there is a bug summitted on this, and a posting to the 
freebsd-sparc64 mailing list.

http://lists.freebsd.org/pipermail/freebsd-sparc64/2005-June/003212.html

Will continue looking into the chipset docs and FreeBSD driver... but thought 
I should point this out.

-- 
Thanks,
Chris (Lance) Gilbert
Ph: +45 33 73 29 31 (UTC +0100)

On Saturday 13 August 2005 23:21, Chris Gilbert wrote:
> Well, I've continued looking into this problem as I really _really_ want to
> see it fixed for 6.0-RELEASE.
>
> I did some general device stress-testing to make sure that is was directly
> triggerable and reproducible, and was not just an intermittent failure.
>
> I have successfully created, and installed FreeBSD on (without any errors):
>
> /dev/ad0a
> /dev/ad0b
> /dev/ad0c
> /dev/ad0d
> /dev/ad0e
> /dev/ad0f
>
> Even though the newfs on it failed, creating the slice itself worked for my
> large partition (/dev/ad0g).
>
> Therefore, I can dd data to it, but I can't write a UFS filesystem to it in
> order to mount.
>
> I then went about writing data to this filesystem for long periods of time
> to try and hit the problem:
>
> # time dd if=/dev/urandom of=/dev/ad0g
> 143337401+0 records in
> 143337401+0 records out
> 73388749312 bytes transferred in 89392.318911 secs (820974 bytes/sec)
> 614.444u 41826.640s 24:49:52.35 47.4%   244+1708k 0+0io 0pf+0w
>
> After this ran without a single error for about 20 hours, I stopped it and
> started trying to hit the block that triggered the issue manually.
>
> After a few hours of "double and half(ing) " I finally managed to find the
> block:
>
> # dd count=1 obs=1024 seek=93321655 if=/dev/urandom of=/dev/ad0g
> 1+0 records in
> 0+1 records out
> 512 bytes transferred in 0.001470 secs (348278 bytes/sec)
>
> This one was successful... but the very next one:
>
> # dd count=1 obs=1024 seek=93321656 if=/dev/urandom of=/dev/ad0g
> ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=268435456
> ad0: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=268435456
> ad0: FAILURE - WRITE_DMA timed out LBA=268435456
> dd: /dev/ad0g: Input/output error
> 1+0 records in
> 0+0 records out
> 0 bytes transferred in 16.453833 secs (0 bytes/sec)
>
> And incrementing this by one block shows:
>
> # dd count=1 obs=1024 seek=93321657 if=/dev/urandom of=/dev/ad0g
> ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=268435458
> ad0: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=268435458
> ad0: FAILURE - WRITE_DMA timed out LBA=268435458
> dd: /dev/ad0g: Input/output error
> 1+0 records in
> 0+0 records out
> 0 bytes transferred in 16.452722 secs (0 bytes/sec)
>
> This makes perfect sense because my block size is specified at 1024 on the
> dd command, and the default blocksize is 512. Therefore, incrementing it by
> a single 1024 size block would return 2 blocks further in the LBA.
>
> ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=268435456
> (then...)
> ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=268435458
>
> Bingo! We've finally found the wall!
>
> I'm going to look further into the IDE chipset (atapci0: <AcerLabs M5229
> UDMA66 controller>) tonight. Both for it's whitepapers (To see if it has
> some sort of quirk or limitation around this area.) and it's FreeBSD
> driver, to see if something funky is going on.
>
> As I said before, if anyone is interesting in helping me resolve this I
> would appreciate it greatly. This is a bug which has haunted me and several
> others since FreeBSD 5.2-RC2 and it needs to be fixed.
Received on Sun Aug 14 2005 - 16:26:21 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:41 UTC