FreeBSD corruption problems on barcelona

From: Jeff Roberson <jroberson_at_chesapeake.net>
Date: Fri, 9 Nov 2007 19:21:05 -0800 (PST)
I have a report from a Linux developer who has kindly done some 
investigation into a problem he's having getting 7.0 beta2 installed due 
to what appears to be serious corruption on writes.  Please read on for 
details.

---------- Forwarded message ----------
Date: Sat, 10 Nov 2007 09:31:31 +1100
From: Nick Piggin <nickpiggin_at_yahoo.com.au>
To: Jeff Roberson <jroberson_at_chesapeake.net>
Subject: Re: Installing FreeBSD 7

On Saturday 10 November 2007 06:37, Jeff Roberson wrote:
> On Sat, 10 Nov 2007, Nick Piggin wrote:
>> Hi Jeff,
>>
>> Your recent blog posts and other tests with FreeBSD 7 performance
>> has me very interested in it :) Anyway I had intended to install it
>> when I got a new test box running, and now is that time.
>>
>> I have installed FreeBSD 4.something long ago, but haven't had the
>> time to look at it since then. You probably don't have the time to
>> answer newbie questions either but if I may just quickly run the
>> problem past you... (feel free to redirect me)
>>
>> I have 7.0-BETA2-amd64-disk1.iso and installing from CD. This seems
>> to be the easy/preferred way to go? No special trick to it -- just
>> follow the standard install?
>
> Yes, it's pretty much the only way to go unless you want to do a scripted
> install.  yes, we know it's terrible.

OK, thanks. I wasn't going to criticise it ;)... it's a little bit
clunky in places. OTOH I much prefer a nice text based installer to
most of the graphical ones.


>> The problem I have is that the install seem quite flakey and has
>> random problems. One install will complain it can't mount root, the
>> next reinstall will stop before the kernel console comes up, the
>> next will complain about some problem with various binaries not
>> being valid executables etc.
>>
>> BETA1.5 was slightly better (I actually got to the point I could log
>> in), but it seemed to have similar corruption in random text files as
>> well, so maybe I just got lucky.
>>
>> The install kernel running off the PATA CD seems completely solid. So
>> I think everything points to a SATA driver problem.
>
> That sounds pretty bad.  Not a good reflection on us is it?  I'll see
> if I can contact the ata driver author and see what he has to say.

Well it is a beta, and it's new hardware; I was prepared to see bugs.
Doesn't seem like many others are having problems, so it might be
something unusual in my configuration.


> For now you could try "set hw.ata.ata_dma=0" "set hw.ata.atapi_dma=0" in
> the boot loader before the install and then again before you boot.  Then
> you can put that without the 'set' in /boot/loader.conf.  Just interrupt
> the boot as it's counting down with a key press and then type 'boot' after
> you've entered the variables.  We used to see random corruption problems
> with pata devices when we tried to do too fast a dma mode.  You'd think the
> signaling would be better with sata but it's worth a shot.

That doesn't work either. After installing again, eg. now I have
corruption in lib/libm.so.5 (among other things).
hexdump /lib/libm.so.5 shows the first 0x10000 bytes are zeroes.


> If you could also do a 'boot -v' from the loader once and dump the dmesg
> into a file that'd be very helpful to the ata guy I'm sure.

Here it is attached. Now there is a cdrom error there, however I
don't believe it is the cause of the problem (or at least, there
is a bigger problem with the sata disk). The install has run
perfectly every time I've run it, so it is pulling the data off
the CD OK.

Now I have actually got as far as root login, I filled up a 1MB
file with /dev/urandom and took an md5. Then copied that to 50
files on the /tmp filesystem, unmounted and remounted it, and then
read back the md5 sums. Practially all of them are wrong, but they
seem to be wrong in the same ways (eg. many share the same
incorrect md5 sum). Reading the files back from disk consistently
gives the same information, so it seems like reads are OK.

Interestingly, a second test didn't show up corruption, so I don't
know how reproduceable it is

Hope this helps.

Thanks,
Nick
Received on Sat Nov 10 2007 - 02:19:37 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:21 UTC