Re: UFS+J panics on HEAD

From: Konstantin Belousov <kostikbel_at_gmail.com> Date: Wed, 23 May 2012 16:10:46 +0300 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:27 UTC

On Wed, May 23, 2012 at 12:40:34AM +0000, Bjoern A. Zeeb wrote:
> Hi,
> 
> I have a machine that since updated to r235609 started to constantly panic
> in the FS code while building universe, first with
> 
> ufs_dirbad: /scratch: bad dir ino 1137225 at offset 17920: mangled entry
> 
> which a clri and a fully forced fsck -y -f seems to have cleared (thanks to
> kib) and now it is giving me:
> 
> mode = 040700, inum = 14560, fs = /scratch
> panic: ffs_valloc: dup alloc
> cpuid = 0
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
> kdb_backtrace() at kdb_backtrace+0x37
> panic() at panic+0x1ce
> ffs_valloc() at ffs_valloc+0x70c
> ufs_makeinode() at ufs_makeinode+0x86
> VOP_CREATE_APV() at VOP_CREATE_APV+0x44
> vn_open_cred() at vn_open_cred+0x4c8
> kern_openat() at kern_openat+0x1f9
> amd64_syscall() at amd64_syscall+0x61e
> Xfast_syscall() at Xfast_syscall+0xf7
> --- syscall (5, FreeBSD ELF64, sys_open), rip = 0x4b94bc, rsp = 0x7fffffffc998, rbp = 0x10 ---
> 
> /scratch has USF+J enabled as another hint.  The machine is also reporting
> ECC memory corrections once in a while (replacement is on its way) but had
> done that the months before the update to the latest HEAD as well w/o the
> FS trouble.
> 
> Anyone an idea on what's going on there or what had changed since Feb/March
> that could cause this?  I am willing to try things if I manage to get a
> kernel compile for testing;-)   otherwise I might dump/dd/newfs/restore and
> see if I can still reproduce it afterwards or whether it just got into a state
> that fsck is failing to correct...
> 

This panic is another protective panic caused by on-disk inconsistent
structures. The bitmap indicated that an inode was free, but actual inode
context suggested that the inode is in use.

I would not worry much about ffs code until known hardware problem on
the machine are fixed. You could try another pass of the full fsck on
the volume, but my expectations are that bad hardware causes continuous
damage to the data and metainformation.