Re: It still here... panic: ufs_dirbad: bad dir

From: David Rhodus <drhodus_at_machdep.com> Date: Sat, 7 Jan 2006 13:10:20 -0500 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:50 UTC

On 1/7/06, Scott Long <scottl_at_samsco.org> wrote:
> David Rhodus wrote:
> > On 1/7/06, Scott Long <scottl_at_samsco.org> wrote:
> >
> >>Pawel Jakub Dawidek wrote:
> >>
> >>
> >>>On Tue, Jan 03, 2006 at 08:46:36AM -0700, Scott Long wrote:
> >>>+> David O'Brien wrote:
> >>>+>
> >>>+> >Just in case anyone thought the bug had been fixed...
> >>>+> >FreeBSD 7.0-CURRENT #531: Mon Jan  2 11:32:17 PST 2006 i386
> >>>+> >panic: ufs_dirbad: bad dir
> >>>+> >cpuid = 1
> >>>+> >KDB: stack backtrace:
> >>>+> >kdb_backtrace(c06c9ba1,1,c06c03c6,eae718c8,c8a91480) at 0xc053657e = kdb_backtrace+0x2e
> >>>+> >panic(c06c03c6,c85bf1f8,dade11,580,c06c0380) at 0xc0516618 = panic+0x128
> >>>+> >ufs_dirbad(c9171bdc,580,c06c0380,0,eae7193c) at 0xc0616e4d = ufs_dirbad+0x4d
> >>>+> >ufs_lookup(eae719e8,c916c528,eae71bc4,c916c528,eae71a24) at 0xc06165cd = ufs_lookup+0x3ad
> >>>+> >VOP_CACHEDLOOKUP_APV(c06f2a80,eae719e8,eae71bc4,c8a91480,cac28d80) at 0xc068cd4e = VOP_CACHEDLOOKUP_APV+0x9e
> >>>+> >vfs_cache_lookup(eae71a90,eae71a90,c916c528,c916c528,eae71bc4) at 0xc057275a = vfs_cache_lookup+0xca
> >>>+> >VOP_LOOKUP_APV(c06f2a80,eae71a90,c8a91480,c106fc88,0) at 0xc068cc66 = VOP_LOOKUP_APV+0xa6
> >>>+> >lookup(eae71b9c,0,c06b5c8e,b6,c057f7ed) at 0xc057760e = lookup+0x44e
> >>>+> >namei(eae71b9c,eae71b3c,60,0,c8a91480) at 0xc0576ecf = namei+0x44f
> >>>+> >kern_stat(c8a91480,8106f20,0,eae71c10,e0) at 0xc05863dd = kern_stat+0x3d
> >>>+> >stat(c8a91480,eae71d04,8,43c,c8a91480) at 0xc058636f = stat+0x2f
> >>>+> >syscall(3b,3b,3b,80dbe80,8106f20) at 0xc0682b43 = syscall+0x323
> >>>+> >Xint0x80_syscall() at 0xc066d33f = Xint0x80_syscall+0x1f
> >>>+>
> >>>+> Please include the console printf that is right about the panic message.
> >>>+> It will say either something about a mangled entry or an isize too
> >>>+> small.  Since this problem is happening consistently for you, but there
> >>>+> seem to be no other problem reports from others, I'd highly suspect that
> >>>+> you have filesystem damage that isn't getting detected by fsck.  I assume that you are running fsck in the foreground and not in the background, yes?  The easiest solution
> >>>+> here might be to figure out which
> >>>+> directory is causing the problem, and just clri its inode and then clean
> >>>+> up the mess.
> >>>
> >>>I'm able to reproduce it with newly newfs(8)ed file system:
> >>>
> >>>/mnt: bad dir ino 17382405 at offset 0: mangled entry
> >>>panic: ufs_dirbad: bad dir
> >>>KDB: enter: panic
> >>>[...]
> >>>db> tr
> >>>Tracing pid 427 tid 100057 td 0xc7ccaa80
> >>>kdb_enter(c060029a,c065c020,c0610849,f6b228c0,100) at kdb_enter+0x30
> >>>panic(c0610849,c7914210,1093c05,0,c0610803) at panic+0xce
> >>>ufs_dirbad(cb2b4b58,0,c0610803,0,f6b22934) at ufs_dirbad+0x4e
> >>>ufs_lookup(f6b229e4,c061b519,cb092c60,cb092c60,f6b22b64) at ufs_lookup+0x39f
> >>>VOP_CACHEDLOOKUP_APV(c063a7e0,f6b229e4,f6b22b64,c7ccaa80,c7d52b80) at VOP_CACHEDLOOKUP_APV+0xc4
> >>>vfs_cache_lookup(f6b22a8c,f6b22a8c,0,cb092c60,0) at vfs_cache_lookup+0xc8
> >>>VOP_LOOKUP_APV(c063a7e0,f6b22a8c,c7ccaa80,38,0) at VOP_LOOKUP_APV+0xa6
> >>>lookup(f6b22b3c,0,c060880c,b5,c0511d45) at lookup+0x454
> >>>namei(f6b22b3c,f6b22b8c,60,0,c7ccaa80) at namei+0x441
> >>>kern_lstat(c7ccaa80,8059800,0,f6b22c10,2) at kern_lstat+0x5b
> >>>lstat(c7ccaa80,f6b22d04,8,43c,c065c740) at lstat+0x2f
> >>>syscall(805003b,807003b,bfbf003b,805f19c,bfbfeba0) at syscall+0x325
> >>>Xint0x80_syscall() at Xint0x80_syscall+0x1f
> >>>--- syscall (190, FreeBSD ELF32, lstat), eip = 0x28176efb, esp = 0xbfbfe90c, ebp = 0xbfbfea48 ---
> >>>
> >>
> >>Since you can reproduce it, can you find out which test it is failing?
> >>At the very least we need to add the test to fsck.
> >>
> >>Scott
> >
> >
> > The main problem with dirbad panics is that the corruption accrued a
> > long time ago, so a backtrace usually doesn't provide enough
> > information to find out what went wrong.
> >
> > Doing a fsck _should_ fix the filesystem corruption, but only after
> > the problem has already accrued.  There are a few cases in which fsck
> > needs to restart its current scan level or it can leave corruption
> > inside the filesystem while marking the partition clean.
> >
> > -DR
>
> Yes, I'm well aware of all of this, that's why I'm asking Pawel to
> determine which test is failing so we can find out why fsck isn't
> catching it.
>
> Scott

I think the problem in Pawels case is that the filesystem itself is
writing out corrupt data then later he's hitting a assertion when the
filesystem is trying to read the corrupt entry.  This seems to be a
problem with UFS itself.

As for fsck, it should fix this problem, but only after its already
happened and it may take two fsck scans.

I'm not sure what the current state of fsck is in fbsd, but one
problem I've noticed in the past while working on fbsd is that if fsck
has to create a lost+found directory it doesn't restart the current
scan level.  This can lead to the dirbad panic.

-DR