On 2/28/06, Yarema <yds_at_coolrat.org> wrote: > > > --On February 28, 2006 2:53:43 PM -0500 Kris Kennaway <kris_at_obsecurity.org> > wrote: > > > On Tue, Feb 28, 2006 at 10:35:36AM -0500, Yarema wrote: > >> > >> > Number: 93942 > >> > Category: kern > >> > Synopsis: panic: ufs_dirbad: bad dir > >> > Confidential: no > >> > Severity: critical > >> > Priority: high > >> > Responsible: freebsd-bugs > >> > State: open > >> > Quarter: > >> > Keywords: > >> > Date-Required: > >> > Class: sw-bug > >> > Submitter-Id: current-users > >> > Arrival-Date: Tue Feb 28 15:40:06 GMT 2006 > >> > Closed-Date: > >> > Last-Modified: > >> > Originator: Yarema <yds_at_CoolRat.org> > >> > Release: FreeBSD 6.1-PRERELEASE i386 > >> > Organization: > >> > Environment: > >> System: FreeBSD 6.1-PRERELEASE #0: Mon Feb 27 04:52:11 EST 2006 i386 > >> > >> > Description: > >> > >> This is at least the third file system which got hosed for me by the > >> ufs_dirbad bug on three different hard drives since 5.3 STABLE. > >> I suspect this is related to the following PRs: > >> http://www.FreeBSD.org/cgi/query-pr.cgi?pr=49079 > >> http://www.FreeBSD.org/cgi/query-pr.cgi?pr=51001 > >> > >> In every case a process would lock up making the whole system > >> unresponsive. A reboot, fsck -y in single user mode and another > >> reboot would produce the following during the mount of the corrupt > >> fs in rw mode: > >> > >> bad dir ino 2 at offset 16384: mangled entry > >> panic: ufs_dirbad: bad dir > >> cpuid = 0 > >> > >> Another reboot, fsck -y in single user mode and reboot produces the > >> same results repeatedly. Previously I had recovered by mounting the > >> corrupt fs in ro mode, backup, newfs, restore. > >> > >> Recently I noticed Matthew Dillon commit the following to the > >> DragonFly src repository: > >> > >> http://leaf.DragonFlyBSD.org/mailarchive/commits/2006-02/msg00057.html > >> > >> dillon 2006/02/21 10:46:56 PST > >> > >> DragonFly src repository > >> > >> Modified files: > >> sys/kern vfs_cluster.c > >> Log: > >> bioops.io_start() was being called in a situation where the buffer > >> could be brelse()'d afterwords instead of I/O being initiated. When > >> this occurs, the buffer may contain softupdates-modified data which is > >> never reverted, resulting in serious filesystem corruption. When > >> io_start is called on a buffer, I/O MUST be initiated and terminated > >> with a biodone() or the buffer's data may not be properly reverted. > >> > >> Solve the problem by moving the io_start() call a little further on in > >> the code, after the potential brelse(). > >> > >> There is a possibility that this bug is responsible for the 'dirbad' > >> panics often reported in DragonFly and FreeBSD circles. > >> > >> Revision Changes Path > >> 1.16 +7 -6 src/sys/kern/vfs_cluster.c > >> > >> http://www.DragonFlyBSD.org/cvsweb/src/sys/kern/vfs_cluster.c.diff?r1=1. > >> 15&r2=1.16&f=u > >> > >> Below is the equivalent patch to the FreeBSD RELENG_6 branch of > >> src/sys/kern/vfs_cluster.c > >> > >> Hope this helps track down the problem. > > > > Does it work for you? :) > > > > Kris > > No way for me to know yet. From what I gathered, mostly from this thread: > <http://docs.FreeBSD.org/cgi/getmsg.cgi?fetch=331058+0+archive/2006/freebsd-current/20060108.freebsd-current> > > As per Matt Dillon > <http://docs.FreeBSD.org/cgi/getmsg.cgi?fetch=217892+0+/usr/local/www/db/text/2006/freebsd-current/20060226.freebsd-current>, > the corruption occurs much earlier than any consequences can be felt. > The patch may prevent the corruption from occurring in the first place. > But the patch does nothing for me now that I have a huge /home slice > which cannot even be mounted as read-only in single user mode without > triggering a page fault kernel panic in the mount process no matter > how many times I run fsck -f on it. > > FWIW the page fault in the mount process is a different sort of kernel > panic than what is described in this kern/93942 PR above. The page fault > occurs while attempting to mount read-only. Attempting to mount raed-write > causes the panic: ufs_dirbad: bad dir > > One more note, hitting the power button when the machine is locked up > before the reboot and mount attempt which causes the panic produces the > following output every time the button is pressed: > > kernel: acpi: suspend request ignored (not ready yet) > > Seems like there's two separate problems: > 1) the root cause of the bad dir corruption. > 2) fsck -f doesn't fix it no matter how many times you run it. > > Any pointers on how to recover my /home slice will be greatly appreciated. > > -- > Yarema I have been working with the bad dir problem for several months and I have not had corruption which fsck would not correct. -DRReceived on Wed Mar 01 2006 - 19:10:43 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:52 UTC