Re: Repeatable kernel panic on -CURRENT using ZFS over SATA

From: Steven Schlansker <stevenschlansker_at_berkeley.edu> Date: Fri, 05 Oct 2007 11:56:38 -0700 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:18 UTC

Dag-Erling Smørgrav wrote:
> Bill Hacker <askbill_at_conducive.net> writes:
>> Short answer - you are overstressing your very marginal hardware.
> 
> You're completely off the mark.  Steven is experiencing a well-known bug
> in the ata driver.
> 
> DES

In case I can be helpful, I would still like to debug this problem.

Please tell me if my constant whining at the list is constructive and
helpful in tracing this bug down :)
If it's not, I'd rather let you guys code than answer my emails, but if
I can be of any help I am willing.

Here's a dump that I captured using -CURRENT as of two nights ago:

Dump header from device /dev/da0s1b
  Architecture: i386
  Architecture Version: 2
  Dump Length: 113577984B (108 MB)
  Blocksize: 512
  Dumptime: Fri Oct  5 00:37:08 2007
  Hostname: scotch.CSUA.Berkeley.EDU
  Magic: FreeBSD Kernel Dump
  Version String: FreeBSD 7.0-CURRENT #1: Thu Oct  4 06:23:40 PDT 2007
    root_at_scotch.CSUA.Berkeley.EDU:/usr/obj/usr/src/sys/GENERIC
  Panic String: from debugger
  Dump Parity: 3604782152
  Bounds: 2
  Dump Status: good

Unread portion of the kernel message buffer:
ad12: FAILURE - device detached
subdisk12: detached
ad12: detached

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x2c
fault code              = supervisor read, page not present
instruction pointer     = 0x20:0xc07422d6
stack pointer           = 0x28:0xd9e98c58
frame pointer           = 0x28:0xd9e98c78
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 3 (g_up)
panic: from debugger
cpuid = 0
Uptime: 16m4s
Physical memory: 499 MB
Dumping 108 MB: 93 77 61 45 29 13

#0  doadump () at pcpu.h:195
195             __asm __volatile("movl %%fs:0,%0" : "=r" (td));
(kgdb) bt
#0  doadump () at pcpu.h:195
#1  0xc074d7ae in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409
#2  0xc074da6b in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:563
#3  0xc048cab7 in db_panic (addr=Could not find the frame base for
"db_panic".
) at /usr/src/sys/ddb/db_command.c:433
#4  0xc048d4a5 in db_command_loop () at /usr/src/sys/ddb/db_command.c:401
#5  0xc048ec15 in db_trap (type=12, code=0) at
/usr/src/sys/ddb/db_main.c:222
#6  0xc07746f6 in kdb_trap (type=12, code=0, tf=0xd9e98c18) at
/usr/src/sys/kern/subr_kdb.c:502
#7  0xc0a01aaf in trap_fatal (frame=0xd9e98c18, eva=44) at
/usr/src/sys/i386/i386/trap.c:863
#8  0xc0a01ce3 in trap_pfault (frame=0xd9e98c18, usermode=0, eva=44) at
/usr/src/sys/i386/i386/trap.c:785
#9  0xc0a02695 in trap (frame=0xd9e98c18) at
/usr/src/sys/i386/i386/trap.c:463
#10 0xc09e81fb in calltrap () at /usr/src/sys/i386/i386/exception.s:139
#11 0xc07422d6 in _mtx_lock_flags (m=0x1c, opts=0,
    file=0xc31edd67
"/usr/src/sys/modules/zfs/../../contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c",
line=472)
    at /usr/src/sys/kern/kern_mutex.c:177
#12 0xc31e2fb4 in ?? ()
#13 0x0000001c in ?? ()
#14 0x00000000 in ?? ()
#15 0xc31edd67 in ?? ()
#16 0x000001d8 in ?? ()
#17 0xc788c5ac in ?? ()
#18 0xc31e2f70 in ?? ()
#19 0xc2d9c840 in ?? ()
#20 0xd9e98cbc in ?? ()
#21 0xc07b0d49 in biodone (bp=0x8) at /usr/src/sys/kern/vfs_bio.c:3009
Previous frame identical to this frame (corrupt stack?)
(kgdb) list *0xc07422d6
0xc07422d6 is in _mtx_lock_flags (/usr/src/sys/kern/kern_mutex.c:178).
173     void
174     _mtx_lock_flags(struct mtx *m, int opts, const char *file, int line)
175     {
176
177             MPASS(curthread != NULL);
178             KASSERT(m->mtx_lock != MTX_DESTROYED,
179                 ("mtx_lock() of destroyed mutex _at_ %s:%d", file, line));
180             KASSERT(LOCK_CLASS(&m->lock_object) ==
&lock_class_mtx_sleep,
181                 ("mtx_lock() of spin mutex %s _at_ %s:%d",
m->lock_object.lo_name,
182                 file, line));
(kgdb) list *0xc31e2fb4
No source file for address 0xc31e2fb4.
(kgdb) list *0xc07b0d49
0xc07b0d49 is in biodone (/usr/src/sys/kern/vfs_bio.c:3010).
3005            if (done == NULL)
3006                    wakeup(bp);
3007            mtx_unlock(&bdonelock);
3008            if (done != NULL)
3009                    done(bp);
3010    }
3011
3012    /*
3013     * Wait for a BIO to finish.
3014     *

Interestingly enough, I can't seem to get a useful backtrace...  all of
those ??? frames!

Perhaps someone who knows more about kernel debugging than I can step me
through from here.  I read the kernel debugging section of the FreeBSD
handbook, and it was not useful as to what to do if the stack is
seemingly corrupt :)

I also have a dump from a time when I hotplugged a SATA drive and it
instantly paniced on me - usually this has been working, but that time
it just gave up.  Not sure how interesting this dump is though, haven't
been able to reproduce it (granted I haven't tried very hard).

-Steven