Re: ffs_copyonwrite panics

From: Fabian Keil <fk_at_fabiankeil.de>
Date: Tue, 18 May 2010 22:48:19 +0200
Roman Bogorodskiy <bogorodskiy_at_gmail.com> wrote:

> I've been using -CURRENT last update in February for quite a long time
> and few weeks ago decided to finally update it. The update was quite
> unfortunate as system became very unstable: it just hangs few times a
> day and panics sometimes.
> 
> Some things can be reproduced, some cannot. Reproducible ones:
> 
> 1. background fsck always makes system hang
> 2. system crashes on operations with nullfs mounts (disabled that for
> now)
> 
> The most annoying one is ffs_copyonwrite panic which I cannot reproduce.
> The thing is that if I will run 'startx' on it with some X apps it will
> panic just in few minutes. When I leave the box with nearly no stress
> (just use it as internet gateway for my laptop) it behaves a little
> better but will eventually crash in few hours anyway.
> 
> The even more annoying thing is that when I cannot save the dump,
> because when the system boots and runs 'savecore' it leads to
> fss_copyonwrite panic as well. The panic happens when about 90% complete
> (as seem via ctrl-t).
> 
> Any ideas how to debug and get rid of this issue?
> 
> System arch is amd64. I don't know what other details could be useful.

I'm not familiar with the background fsck issue, but if the nullfs
panic looks like this one, there's a fair chance it's already fixed:

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address	= 0x10
fault code		= supervisor read data, page not present
instruction pointer	= 0x20:0xffffffff82412f14
stack pointer	        = 0x28:0xffffff803e564620
frame pointer	        = 0x28:0xffffff803e564770
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 1825 (jail)
panic: from debugger
cpuid = 0
Uptime: 38s
Dumping 1992 MB (5 chunks)
  chunk 0: 1MB (155 pages) ... ok
  chunk 1: 1990MB (509345 pages) 1974 [...] 6 ... ok
  chunk 2: 2MB (273 pages) ... ok
  chunk 3: 1MB (184 pages)

#0  doadump () at pcpu.h:223
223	pcpu.h: No such file or directory.
	in pcpu.h
(kgdb) #0  doadump () at pcpu.h:223
#1  0xffffffff803c506f in boot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:416
#2  0xffffffff803c546c in panic (fmt=Variable "fmt" is not available.
)
    at /usr/src/sys/kern/kern_shutdown.c:590
#3  0xffffffff801f6e77 in db_panic (addr=Variable "addr" is not available.
)
    at /usr/src/sys/ddb/db_command.c:478
#4  0xffffffff801f7281 in db_command (last_cmdp=0xffffffff808bfd80, cmd_table=Variable "cmd_table" is not available.

) at /usr/src/sys/ddb/db_command.c:445
#5  0xffffffff801f74d0 in db_command_loop ()
    at /usr/src/sys/ddb/db_command.c:498
#6  0xffffffff801f9429 in db_trap (type=Variable "type" is not available.
) at /usr/src/sys/ddb/db_main.c:229
#7  0xffffffff803f3c25 in kdb_trap (type=12, code=0, tf=0xffffff803e564570)
    at /usr/src/sys/kern/subr_kdb.c:535
#8  0xffffffff8062ad9d in trap_fatal (frame=0xffffff803e564570, eva=Variable "eva" is not available.
)
    at /usr/src/sys/amd64/amd64/trap.c:773
#9  0xffffffff8062b0fc in trap_pfault (frame=0xffffff803e564570, usermode=0)
    at /usr/src/sys/amd64/amd64/trap.c:694
#10 0xffffffff8062b8ff in trap (frame=0xffffff803e564570)
    at /usr/src/sys/amd64/amd64/trap.c:451
#11 0xffffffff80611f33 in calltrap ()
    at /usr/src/sys/amd64/amd64/exception.S:223
#12 0xffffffff82412f14 in null_bypass (ap=0xffffff803e564780)
    at /usr/src/sys/modules/nullfs/../../fs/nullfs/null_vnops.c:269
#13 0xffffffff80448104 in vgonel (vp=0xffffff0005e05780) at vnode_if.h:1099
#14 0xffffffff8044835e in vrecycle (vp=0xffffff0005e05780, td=Variable "td" is not available.
)
    at /usr/src/sys/kern/vfs_subr.c:2505
#15 0xffffffff82412e6f in null_inactive (ap=Variable "ap" is not available.
)
    at /usr/src/sys/modules/nullfs/../../fs/nullfs/null_vnops.c:665
#16 0xffffffff80444ff8 in vinactive (vp=0xffffff0005e05780, 
    td=0xffffff00054743e0) at vnode_if.h:807
#17 0xffffffff804495dd in vputx (vp=0xffffff0005e05780, func=2)
    at /usr/src/sys/kern/vfs_subr.c:2226
#18 0xffffffff8043e1ae in lookup (ndp=0xffffff803e564a50)
    at /usr/src/sys/kern/vfs_lookup.c:905
#19 0xffffffff8043eef7 in namei (ndp=0xffffff803e564a50)
    at /usr/src/sys/kern/vfs_lookup.c:269
#20 0xffffffff8044ec86 in kern_accessat (td=0xffffff00054743e0, fd=-100, 
    path=0x800537000 <Address 0x800537000 out of bounds>, pathseg=Variable "pathseg" is not available.
)
    at /usr/src/sys/kern/vfs_syscalls.c:2140
#21 0xffffffff8062b21d in syscall (frame=0xffffff803e564c80)
    at /usr/src/sys/amd64/amd64/trap.c:946
#22 0xffffffff80612211 in Xfast_syscall ()
    at /usr/src/sys/amd64/amd64/exception.S:374
#23 0x000000080050e5ec in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb) 

I got it reproducible with:

FreeBSD 9.0-CURRENT #66 r+3fe665b: Fri May 14 17:45:10 CEST 2010
    fk_at_r500.local:/usr/obj/usr/src/sys/ZOEY amd64

but it had already been fixed in Subversion/CVS on Saturday so I
didn't investigate which commit caused it and which one fixed it.

My previous kernel without the issue was:
FreeBSD 9.0-CURRENT #65 r+6f48909: Sat May  8 19:28:58 CEST 2010
I'm currently using:
FreeBSD 9.0-CURRENT #69 r+3a7afc7: Sun May 16 20:04:53 CEST 2010
without any issues either. I don't use background fsck, though.

Fabian

Received on Tue May 18 2010 - 18:59:13 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:03 UTC