panic: solaris assert: vdev_config_sync(rvd, txg) == 0, file: /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c, line: 3014

From: Fabian Keil <freebsd-listen_at_fabiankeil.de>
Date: Fri, 30 May 2008 20:22:05 +0200
A few days ago I used fdisk -p, modified two slice types
in the output and used it as fdisk "config file" with the
intention to merely change the slice types on disk.

As an extra service, fdisk "adjusted" the size of the last
slice (ad0s3) for me, thus the last sectors of ad0s3f became
unreachable and geli could no longer read the meta information.

ad0s3f.eli is part of the following ZFS pool:

fk_at_TP51 ~ $sudo zpool status tank
  pool: tank
 state: ONLINE
 scrub: scrub completed with 0 errors on Wed May 28 22:05:33 2008
config:

        NAME          STATE     READ WRITE CKSUM
        tank          ONLINE       0     0     0
          ad0s3f.eli  ONLINE       0     0     0
          ad0s2.eli   ONLINE       0     0     0

errors: No known data errors

After fdisk's "adjustment" ad0s2.eli was still available,
while ad0s3f.eli wasn't. This reproducible caused the following
panic a few seconds after loading the zfs module:

Unread portion of the kernel message buffer:
panic: solaris assert: vdev_config_sync(rvd, txg) == 0, file: /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c, line: 3014
cpuid = 0
KDB: enter: panic
panic: from debugger
cpuid = 0
Uptime: 59s
Physical memory: 998 MB
Dumping 85 MB: 70 54 38 22 6
[...]
(kgdb) where
#0  doadump () at pcpu.h:196
#1  0xc05c2446 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418
#2  0xc05c2673 in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:572
#3  0xc04ab827 in db_panic (addr=Could not find the frame base for "db_panic".
) at /usr/src/sys/ddb/db_command.c:446
#4  0xc04ac1dc in db_command (last_cmdp=0xc08d4190, cmd_table=0x0, dopager=1) at /usr/src/sys/ddb/db_command.c:413
#5  0xc04ac2ea in db_command_loop () at /usr/src/sys/ddb/db_command.c:466
#6  0xc04adadd in db_trap (type=3, code=0) at /usr/src/sys/ddb/db_main.c:228
#7  0xc05e97e6 in kdb_trap (type=3, code=0, tf=0xf3b10b24) at /usr/src/sys/kern/subr_kdb.c:534
#8  0xc08192eb in trap (frame=0xf3b10b24) at /usr/src/sys/i386/i386/trap.c:683
#9  0xc07feddb in calltrap () at /usr/src/sys/i386/i386/exception.s:165
#10 0xc05e996a in kdb_enter (why=0xc085d1da "panic", msg=0xc085d1da "panic") at cpufunc.h:60
#11 0xc05c265c in panic (fmt=0xc552b214 "solaris assert: %s, file: %s, line: %d") at /usr/src/sys/kern/kern_shutdown.c:556
#12 0xc54edb91 in spa_sync (spa=Variable "spa" is not available.
) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/spa.c:3014
#13 0xc54f4aca in txg_sync_thread (arg=0xc4c56400) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/txg.c:331
#14 0xc05a63e4 in fork_exit (callout=0xc54f48e0 <txg_sync_thread>, arg=0xc4c56400, frame=0xf3b10d38) at /usr/src/sys/kern/kern_fork.c:812
#15 0xc07fee50 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:270

With both pool members unavailable the panic didn't occur:

fk_at_TP51 ~ $sudo zpool status
  pool: tank
 state: FAULTED
status: One or more devices could not be used because the label is missing 
        or invalid.  There are insufficient replicas for the pool to continue
        functioning.
action: Destroy and re-create the pool from a backup source.
   see: http://www.sun.com/msg/ZFS-8000-5E
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        tank        FAULTED      0     0     0  corrupted data
          ad0s3f    UNAVAIL      0     0     0  corrupted data
          ad0s2     UNAVAIL      0     0     0  corrupted data

After disabling the corrupted pool, I was also unable to import any other:

[My notes are incomplete, but I think I just used "zpool export tank" here.]

fk_at_TP51 ~ $sudo zpool status
no pools available 

fk_at_TP51 ~ $sudo zpool import sv120
Assertion failed: ((null)), function fd == 0, file /usr/src/cddl/lib/libzfs/../../../cddl/contrib/opensolaris/lib/libzfs/common/libzfs_import.c, line 771.
Abort trap: 6 (core dumped)

I'm using FreeBSD 8.0-CURRENT #0: Tue May 27 21:38:01 CEST 2008
fk_at_TP51.local:/usr/obj/usr/src/sys/THINKPAD i386.

Should I file a PR about this (the ZFS part)?

Given that a fdisk hack with the offending "adjustment"
code removed was able to get the whole ad0s3f back,
I'm also wondering if it wouldn't make sense to provide
fdisk with a "no adjustments, please" option?

Fabian

Received on Fri May 30 2008 - 16:22:16 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:31 UTC