Re: Root FS corruption

From: Gordon Tetlow <gordon_at_tetlows.org>
Date: Mon, 29 May 2006 11:28:25 -0700
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Julian Elischer wrote:
> Yar Tikhiy wrote:
>> On Fri, May 26, 2006 at 11:24:58AM +0400, Yar Tikhiy wrote:
>>
>>> I still can damage a file on the root FS by running nextboot.  This
>>> seems very reproducible.  A subsequent reboot is needed for the
>>> damage to happen actually.  The pattern is the same:  A fragment
>>> is allocated to nextboot.conf in the block immediately preceding
>>> another file's block.  The nextboot.conf contents are written out
>>> later (when syncing disks before the reboot?) to the neighbour
>>> file's first fragment.  Nextboot.conf itself has correct contents,
>>> which means that the contents are written out twice for some reason.
>>>
>>> Nextboot is a simple shell script just writing out nextboot.conf,
>>> which means that any file write following the same scenario (creat
>>> and write a small file, then reboot) should result in damage to
>>> anothe file on the same FS.  Of course, the FS fill pattern may
>>> affect this.  In my case, the FS is only half full, which apparently
>>> allows for allocating a new block to the small file, not a fragment
>>> in a partially occupied block.
>>
>>
>> Folks, I have good news for all of us:  This kind of corruption
>> isn't done by the kernel.  Thanks to Ian Dowse, I found out that
>> /boot/loader would rewrite nextboot.conf through libufs or whatever.
>> This is done in support.4th, the word is rewrite_nextboot_file.
>> Initially I missed a clear sign of the problem being caused by the
>> loader:  The corrupted data started with `nextboot_enable="NO" \n',
>> which is the string written from support.4th.  The actual bug must
>> be hiding in libufs, or whatever loader uses to access UFS.

As I was reading this thread, I started to worry about how my work (the
current nextboot implementation) had hosed thousands of developer's
machines.

>> Recent technical details of my investigation have been filed
>> in PR bin/98005:
>>
>>     http://www.freebsd.org/cgi/query-pr.cgi?pr=98005
>>
>> The conclusion is:  Avoid nextboot(8) for now.
>>
> 
> the current nextboot fails to provide  all the designed functionality
> of the previous nextboot. (which is why we still use the old one at
> ironport)
> One day I'll get around to reimplementing the old one..

If you do plan on reimplementing it, please see if you can make it work
on more than i386 (which if I recall was one of the major limitations of
the original nextboot implementation).

> (the design criteria were:)
> 
> Store the nextboot info "not in a filesystem". (the filesystem may be
> corrupt
> or there ma be several types of filesystem available).
> Change that info from boot0 without writing to a filesystem.
> (to note that it was used)
> Be able to store different stuff on different disks at the same time.
> Be able to ensure that you could specify how many times the
> information was used before falling back to something else.

The nice thing about the current implementation is the ability to use it
without a complicated setup or having to do a special installation type.
That said, I really wouldn't call it a replacement of the original
nextboot (in fact, I didn't even know the old nextboot existed when I
wrote this one, it was just the most logical name). I envisioned it more
as a developer tool. I guess I don't ever think people should use my
code on production systems =)

- -gordon
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFEez1JRu2t9DV9ZfsRAguUAJ0TyvKkZ9Iwtu48u00qw+y1P1LegwCg1d5z
FaN1kJran6Cu0EqZxjYKjhc=
=lkHL
-----END PGP SIGNATURE-----
Received on Mon May 29 2006 - 16:29:04 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:56 UTC