So, I was playing around with RAID-Z and self-healing, when I decided to take it another step and corrupt the data on *two* disks (well, files via ggate) and see what happened. I obviously expected the pool to go offline, but I didn't expect a kernel panic to follow! What I did was something resembling: 1) create three 100MB files, ggatel create to create GEOM providers from them 2) zpool create test raidz ggate{1..3} 3) create a 100MB file inside the pool, md5 the file 4) overwrite 10~20MB (IIRC) of disk2 with /dev/random, with dd if=/dev/ random of=./disk2 bs=1000k count=20 skip=40, or so (I now know that I wanted *seek*, not *skip*, but it still shouldn't panic!) 5) Check if the md5 of file: everything OK, zpool status shows a degraded pool. 6) Repeat step #4, but with disk 3. 7) zpool scrub test 8) Panic! FreeBSD chaos.exscape.org 8.0-CURRENT FreeBSD 8.0-CURRENT #2: Thu May 21 22:42:42 CEST 2009 root_at_chaos.exscape.org:/usr/obj/usr/src/sys/ DTRACE amd64 May 24 09:13:12 chaos root: ZFS: vdev failure, zpool=test type=vdev.bad_label May 24 09:13:15 chaos last message repeated 2 times panic: solaris assert: 0 == zap_add(dp->dp_meta_objset, DMU_POOL_DIRECTORY_OBJECT, DMU_POOL_SCRUB_FUNC, sizeof (uint32_t), 1, &dp->dp_scrub_func, tx), file: /usr/src/sys/modules/zfs/../../cddl/ contrib/opensolaris/uts/common/fs/zfs/dsl_scrub.c, line: 122 cpuid = 0 KDB: enter: panic panic: from debugger cpuid = 0 Uptime: 22h47m41s Physical memory: 2028 MB Dumping 1754 MB: ... #0 doadump () at pcpu.h:223 223 pcpu.h: No such file or directory. in pcpu.h (kgdb) #0 doadump () at pcpu.h:223 #1 0xffffffff80576039 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:420 #2 0xffffffff8057648c in panic (fmt=Variable "fmt" is not available. ) at /usr/src/sys/kern/kern_shutdown.c:576 #3 0xffffffff801d5b07 in db_panic (addr=Variable "addr" is not available. ) at /usr/src/sys/ddb/db_command.c:478 #4 0xffffffff801d5f11 in db_command (last_cmdp=0xffffffff80bd8820, cmd_table=Variable "cmd_table" is not available. ) at /usr/src/sys/ddb/db_command.c:445 #5 0xffffffff801d6160 in db_command_loop () at /usr/src/sys/ddb/db_command.c:498 #6 0xffffffff801d80f9 in db_trap (type=Variable "type" is not available. ) at /usr/src/sys/ddb/db_main.c:229 #7 0xffffffff805a6ad5 in kdb_trap (type=3, code=0, tf=0xffffff803ea9e700) at /usr/src/sys/kern/subr_kdb.c:534 #8 0xffffffff808610e8 in trap (frame=0xffffff803ea9e700) at /usr/src/sys/amd64/amd64/trap.c:613 #9 0xffffffff8083af97 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:223 #10 0xffffffff805a6cad in kdb_enter (why=0xffffffff8095e234 "panic", msg=0xa <Address 0xa out of bounds>) at cpufunc.h:63 #11 0xffffffff8057649b in panic (fmt=Variable "fmt" is not available. ) at /usr/src/sys/kern/kern_shutdown.c:559 #12 0xffffffff80eaa157 in dsl_pool_scrub_setup_sync () from /boot/kernel/zfs.ko #13 0xffffffff80ea562b in dsl_sync_task_group_sync () from /boot/ kernel/zfs.ko #14 0xffffff00560fb298 in ?? () #15 0xffffff803ea9e980 in ?? () #16 0x0000000000000000 in ?? () #17 0xffffff001ef49b48 in ?? () #18 0x0000000000000029 in ?? () #19 0xffffff00384c4b00 in ?? () #20 0xffffff803ea9ea00 in ?? () #21 0xffffff803ea9ea40 in ?? () #22 0xffffffff80ea5153 in dsl_pool_sync () from /boot/kernel/zfs.ko Previous frame inner to this frame (corrupt stack?) Full core.txt: http://pastebin.com/f546fefdf Regards, Thomas PS. Should I file PRs regarding 8-CURRENT or not?Received on Sun May 24 2009 - 17:02:34 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:48 UTC