Re: panic: ffs_valloc: dup alloc in 6.1-BETA4

From: Eric Anderson <anderson_at_centtech.com> Date: Tue, 21 Mar 2006 15:03:47 -0600 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:53 UTC

Kris Kennaway wrote:
> On Mon, Mar 20, 2006 at 03:28:46PM -0500, John Baldwin wrote:
>   
>> On Friday 17 March 2006 15:47, Eric Anderson wrote:
>>     
>>> Eric Anderson wrote:
>>>       
>>>> [moved to -current due to lack of response]
>>>>
>>>> Eric Anderson wrote:
>>>>         
>>>>> Mike Tancsa wrote:
>>>>>           
>>>>>> At 04:48 PM 13/03/2006, Eric Anderson wrote:
>>>>>>             
>>>>>>> I get the above panic after nfs clients attach to this nfs server 
>>>>>>> and being
>>>>>>> I do have dumps from two crashes so far.
>>>>>>> This is FreeBSD-6.1-PRERELEASE from Friday-ish.
>>>>>>>               
>>>>>> Dont know if it was fixed or not, but there were a lot of VM changes 
>>>>>> committed last night that might help.
>>>>>>
>>>>>> http://lists.freebsd.org/pipermail/freebsd-stable/2006-March/023526.html 
>>>>>>
>>>>>>             
>>>>> I just updated, and it still happens.  More information for those 
>>>>> interested:
>>>>>
>>>>> mode = 0100600, inum = 58456203, fs = /mnt
>>>>> panic: ffs_valloc: dup alloc
>>>>>
>>>>>
>>>>> #0  doadump () at pcpu.h:165
>>>>> 165             __asm __volatile("movl %%fs:0,%0" : "=r" (td));
>>>>> (kgdb) backtrace
>>>>> #0  doadump () at pcpu.h:165
>>>>> #1  0xc064482f in boot (howto=260) at 
>>>>> /usr/src/sys/kern/kern_shutdown.c:399
>>>>> #2  0xc0644b55 in panic (fmt=0xc0890967 "ffs_valloc: dup alloc") at 
>>>>> /usr/src/sys/kern/kern_shutdown.c:555
>>>>> #3  0xc077ee3c in ffs_valloc (pvp=0xc8eab440, mode=33152, 
>>>>> cred=0xc8a91d80, vpp=0xe83a5824) at /usr/src/sys/ufs/ffs/ffs_alloc.c:945
>>>>> #4  0xc07a5933 in ufs_makeinode (mode=33152, dvp=0xc8eab440, 
>>>>> vpp=0xe83a5acc, cnp=0xe83a5ae0) at /usr/src/sys/ufs/ufs/ufs_vnops.c:2165
>>>>> #5  0xc07a2b0d in ufs_create (ap=0x0) at 
>>>>> /usr/src/sys/ufs/ufs/ufs_vnops.c:171
>>>>> #6  0xc082dc98 in VOP_CREATE_APV (vop=0x0, a=0xe83a5a18) at 
>>>>> vnode_if.c:204
>>>>> #7  0xc0737590 in nfsrv_create (nfsd=0xc8a91d00, slp=0xc8816700, 
>>>>> td=0xc7d99780, mrq=0xe83a5c98) at vnode_if.h:111
>>>>> #8  0xc0744e95 in nfssvc_nfsd (td=0x0) at 
>>>>> /usr/src/sys/nfsserver/nfs_syscalls.c:472
>>>>> #9  0xc0744688 in nfssvc (td=0xc7d99780, uap=0xe83a5d04) at 
>>>>> /usr/src/sys/nfsserver/nfs_syscalls.c:181
>>>>> #10 0xc081cd7f in syscall (frame=
>>>>>      {tf_fs = 59, tf_es = 59, tf_ds = 59, tf_edi = 1, tf_esi = 0, 
>>>>> tf_ebp = -1077941448, tf_isp = -398828188, tf_ebx = 4, tf_edx = 
>>>>> 672385208, tf_ecx = 25, tf_eax = 155, tf_trapno = 12, tf_err = 2, 
>>>>> tf_eip = 671840155, tf_cs = 51, tf_eflags = 662, tf_esp = 
>>>>> -1077941476, tf_ss = 59}) at /usr/src/sys/i386/i386/trap.c:981
>>>>> #11 0xc0809e8f in Xint0x80_syscall () at 
>>>>> /usr/src/sys/i386/i386/exception.s:200
>>>>> #12 0x00000033 in ?? ()
>>>>> Previous frame inner to this frame (corrupt stack?)
>>>>> (kgdb)
>>>>>
>>>>> Maybe that helps somebody?
>>>>>
>>>>> Should I sent this to -current instead, since it appears this would 
>>>>> happen under -current also, and possibly there is a larger base of 
>>>>> people watching the list?
>>>>>           
>>>> Also, here's a screenshot of the crash, and I have a good dump if 
>>>> anyone wants me to get more debugging info.
>>>>
>>>> http://www.googlebit.com/freebsd/fbsd-6.1b4-nfscrash.png
>>>>
>>>>         
>>> Oh yea, and I can reproduce at will, on two separate machines.
>>>       
>> If you boot the machines in single user and run 'fsck -y' repeatedly
>> until fsck stops finding breakage does it work ok after that?  It maybe
>> that you have corrupted disks that bgfsck just can't handle.
>>     
>
> Basically it seems to me that bg fsck is always dangerous: there is an
> assumption that the only kinds of filesystem damage that exist are the
> "harmless" kinds (from power failure) it can later repair.  But this
> is clearly false, because the filesystem may be in an arbitrarily
> damaged state (e.g. after a panic), and the kernel does not handle the
> possibility that filesystem data may not be completely trustable at
> runtime (this was the point of foreground fsck).
>   

Turns out, that this bug was caused by no having softupdates enabled on 
the filesystem.

So, here's how to reproduce the problem, at least this brought the 
problem about two times.

newfs /dev/device
(softupdates not enabled I guess)

mount /dev/device /mnt

export the filesystem
mount the filesystem on a client
begin lots of writes to the nfs mounted filesystem over NFS
power cycle the server
fsck_ffs -y /dev/device
Once it's clean, mount, export, and within a few seconds, panic.

fsck'ing, and then enabling softupdates, makes the problem disappear. 

Eric

-- 
------------------------------------------------------------------------
Eric Anderson        Sr. Systems Administrator        Centaur Technology
Anything that works is better than anything that doesn't.
------------------------------------------------------------------------