Well, I've continued looking into this problem as I really _really_ want to see it fixed for 6.0-RELEASE. I did some general device stress-testing to make sure that is was directly triggerable and reproducible, and was not just an intermittent failure. I have successfully created, and installed FreeBSD on (without any errors): /dev/ad0a /dev/ad0b /dev/ad0c /dev/ad0d /dev/ad0e /dev/ad0f Even though the newfs on it failed, creating the slice itself worked for my large partition (/dev/ad0g). Therefore, I can dd data to it, but I can't write a UFS filesystem to it in order to mount. I then went about writing data to this filesystem for long periods of time to try and hit the problem: # time dd if=/dev/urandom of=/dev/ad0g 143337401+0 records in 143337401+0 records out 73388749312 bytes transferred in 89392.318911 secs (820974 bytes/sec) 614.444u 41826.640s 24:49:52.35 47.4% 244+1708k 0+0io 0pf+0w After this ran without a single error for about 20 hours, I stopped it and started trying to hit the block that triggered the issue manually. After a few hours of "double and half(ing) " I finally managed to find the block: # dd count=1 obs=1024 seek=93321655 if=/dev/urandom of=/dev/ad0g 1+0 records in 0+1 records out 512 bytes transferred in 0.001470 secs (348278 bytes/sec) This one was successful... but the very next one: # dd count=1 obs=1024 seek=93321656 if=/dev/urandom of=/dev/ad0g ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=268435456 ad0: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=268435456 ad0: FAILURE - WRITE_DMA timed out LBA=268435456 dd: /dev/ad0g: Input/output error 1+0 records in 0+0 records out 0 bytes transferred in 16.453833 secs (0 bytes/sec) And incrementing this by one block shows: # dd count=1 obs=1024 seek=93321657 if=/dev/urandom of=/dev/ad0g ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=268435458 ad0: TIMEOUT - WRITE_DMA retrying (0 retries left) LBA=268435458 ad0: FAILURE - WRITE_DMA timed out LBA=268435458 dd: /dev/ad0g: Input/output error 1+0 records in 0+0 records out 0 bytes transferred in 16.452722 secs (0 bytes/sec) This makes perfect sense because my block size is specified at 1024 on the dd command, and the default blocksize is 512. Therefore, incrementing it by a single 1024 size block would return 2 blocks further in the LBA. ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=268435456 (then...) ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=268435458 Bingo! We've finally found the wall! I'm going to look further into the IDE chipset (atapci0: <AcerLabs M5229 UDMA66 controller>) tonight. Both for it's whitepapers (To see if it has some sort of quirk or limitation around this area.) and it's FreeBSD driver, to see if something funky is going on. As I said before, if anyone is interesting in helping me resolve this I would appreciate it greatly. This is a bug which has haunted me and several others since FreeBSD 5.2-RC2 and it needs to be fixed. -- Thanks, Chris (Lance) Gilbert Ph: +45 33 73 29 31 (UTC +0100)Received on Sun Aug 14 2005 - 15:57:45 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:41 UTC