Re: Root FS corruption

From: Yar Tikhiy <yar_at_comp.chem.msu.su>
Date: Sat, 27 May 2006 15:04:16 +0400
On Fri, May 26, 2006 at 11:24:58AM +0400, Yar Tikhiy wrote:
> 
> I still can damage a file on the root FS by running nextboot.  This
> seems very reproducible.  A subsequent reboot is needed for the
> damage to happen actually.  The pattern is the same:  A fragment
> is allocated to nextboot.conf in the block immediately preceding
> another file's block.  The nextboot.conf contents are written out
> later (when syncing disks before the reboot?) to the neighbour
> file's first fragment.  Nextboot.conf itself has correct contents,
> which means that the contents are written out twice for some reason.
> 
> Nextboot is a simple shell script just writing out nextboot.conf,
> which means that any file write following the same scenario (creat
> and write a small file, then reboot) should result in damage to
> anothe file on the same FS.  Of course, the FS fill pattern may
> affect this.  In my case, the FS is only half full, which apparently
> allows for allocating a new block to the small file, not a fragment
> in a partially occupied block.

Folks, I have good news for all of us:  This kind of corruption
isn't done by the kernel.  Thanks to Ian Dowse, I found out that
/boot/loader would rewrite nextboot.conf through libufs or whatever.
This is done in support.4th, the word is rewrite_nextboot_file.
Initially I missed a clear sign of the problem being caused by the
loader:  The corrupted data started with `nextboot_enable="NO" \n',
which is the string written from support.4th.  The actual bug must
be hiding in libufs, or whatever loader uses to access UFS.

Recent technical details of my investigation have been filed
in PR bin/98005:

	http://www.freebsd.org/cgi/query-pr.cgi?pr=98005

The conclusion is:  Avoid nextboot(8) for now.

-- 
Yar
Received on Sat May 27 2006 - 09:04:35 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:56 UTC