ZFS panic under extreme circumstances (2/3 disks corrupted)

From: Thomas Backman <serenity_at_exscape.org>
Date: Sun, 24 May 2009 21:02:18 +0200
So, I was playing around with RAID-Z and self-healing, when I decided  
to take it another step and corrupt the data on *two* disks (well,  
files via ggate) and see what happened. I obviously expected the pool  
to go offline, but I didn't expect a kernel panic to follow!

What I did was something resembling:
1) create three 100MB files, ggatel create to create GEOM providers  
from them
2) zpool create test raidz ggate{1..3}
3) create a 100MB file inside the pool, md5 the file
4) overwrite 10~20MB (IIRC) of disk2 with /dev/random, with dd if=/dev/ 
random of=./disk2 bs=1000k count=20 skip=40, or so (I now know that I  
wanted *seek*, not *skip*, but it still shouldn't panic!)
5) Check if the md5 of file: everything OK, zpool status shows a  
degraded pool.
6) Repeat step #4, but with disk 3.
7) zpool scrub test
8) Panic!

FreeBSD chaos.exscape.org 8.0-CURRENT FreeBSD 8.0-CURRENT #2: Thu May  
21 22:42:42 CEST 2009     root_at_chaos.exscape.org:/usr/obj/usr/src/sys/ 
DTRACE  amd64

May 24 09:13:12 chaos root: ZFS: vdev failure, zpool=test  
type=vdev.bad_label
May 24 09:13:15 chaos last message repeated 2 times
panic: solaris assert: 0 == zap_add(dp->dp_meta_objset,  
DMU_POOL_DIRECTORY_OBJECT, DMU_POOL_SCRUB_FUNC, sizeof (uint32_t), 1,  
&dp->dp_scrub_func, tx), file: /usr/src/sys/modules/zfs/../../cddl/ 
contrib/opensolaris/uts/common/fs/zfs/dsl_scrub.c, line: 122
cpuid = 0
KDB: enter: panic
panic: from debugger
cpuid = 0
Uptime: 22h47m41s
Physical memory: 2028 MB
Dumping 1754 MB: ...

#0  doadump () at pcpu.h:223
223	pcpu.h: No such file or directory.
	in pcpu.h
(kgdb) #0  doadump () at pcpu.h:223
#1  0xffffffff80576039 in boot (howto=260)
     at /usr/src/sys/kern/kern_shutdown.c:420
#2  0xffffffff8057648c in panic (fmt=Variable "fmt" is not available.
)
     at /usr/src/sys/kern/kern_shutdown.c:576
#3  0xffffffff801d5b07 in db_panic (addr=Variable "addr" is not  
available.
)
     at /usr/src/sys/ddb/db_command.c:478
#4  0xffffffff801d5f11 in db_command (last_cmdp=0xffffffff80bd8820,  
cmd_table=Variable "cmd_table" is not available.

) at /usr/src/sys/ddb/db_command.c:445
#5  0xffffffff801d6160 in db_command_loop ()
     at /usr/src/sys/ddb/db_command.c:498
#6  0xffffffff801d80f9 in db_trap (type=Variable "type" is not  
available.
) at /usr/src/sys/ddb/db_main.c:229
#7  0xffffffff805a6ad5 in kdb_trap (type=3, code=0,  
tf=0xffffff803ea9e700)
     at /usr/src/sys/kern/subr_kdb.c:534
#8  0xffffffff808610e8 in trap (frame=0xffffff803ea9e700)
     at /usr/src/sys/amd64/amd64/trap.c:613
#9  0xffffffff8083af97 in calltrap ()
     at /usr/src/sys/amd64/amd64/exception.S:223
#10 0xffffffff805a6cad in kdb_enter (why=0xffffffff8095e234 "panic",
     msg=0xa <Address 0xa out of bounds>) at cpufunc.h:63
#11 0xffffffff8057649b in panic (fmt=Variable "fmt" is not available.
)
     at /usr/src/sys/kern/kern_shutdown.c:559
#12 0xffffffff80eaa157 in dsl_pool_scrub_setup_sync ()
    from /boot/kernel/zfs.ko
#13 0xffffffff80ea562b in dsl_sync_task_group_sync () from /boot/ 
kernel/zfs.ko
#14 0xffffff00560fb298 in ?? ()
#15 0xffffff803ea9e980 in ?? ()
#16 0x0000000000000000 in ?? ()
#17 0xffffff001ef49b48 in ?? ()
#18 0x0000000000000029 in ?? ()
#19 0xffffff00384c4b00 in ?? ()
#20 0xffffff803ea9ea00 in ?? ()
#21 0xffffff803ea9ea40 in ?? ()
#22 0xffffffff80ea5153 in dsl_pool_sync () from /boot/kernel/zfs.ko
Previous frame inner to this frame (corrupt stack?)

Full core.txt: http://pastebin.com/f546fefdf

Regards,
Thomas

PS. Should I file PRs regarding 8-CURRENT or not? 
Received on Sun May 24 2009 - 17:02:34 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:48 UTC