panic: Bad link elm, nfsd related?

From: Matthew West <mwest_at_l.zeeb.org>
Date: Mon, 23 Mar 2009 14:08:20 +0000
FreeBSD 8-CURRENT, built from sources around 27/02/2009:

FreeBSD foo.internal 8.0-CURRENT FreeBSD 8.0-CURRENT #0: Fri Feb 27 12:43:45 GMT 2009 mwest_at_foo.internal:/usr/obj/usr/src/sys/DEBUGLOCK amd64

The system is AMD64, with 16GB of RAM, serving a few clients via NFS (v2
and v3) and Samba, from a 800GB ZFS pool; using hardware RAID (aac
controller), not RAID-Z.  Running a GENERIC kernel, but with the
standard deadlock debugging options enabled.

After 1-2 weeks, the system will panic with the following:

----------
panic: Bad link elm 0xffffff0011febc00 next->prev != elm
cpuid = 3
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
panic() at panic+0x182
xprt_unregister_locked() at xprt_unregister_locked+0xbe
xprt_unregister() at xprt_unregister+0x2c
svc_run_internal() at svc_run_internal+0x42f
svc_thread_start() at svc_thread_start+0xb
fork_exit() at fork_exit+0x12a
fork_trampoline() at fork_trampoline+0xe
--- trap 0xc, rip = 0x800695c4c, rsp = 0x7fffffffe8e8, rbp = 0 ---
KDB: enter: panic
[thread pid 920 tid 100272 ]
Stopped at      kdb_enter+0x3d: movq    $0,0x65ba38(%rip)
db> bt
Tracing pid 920 tid 100272 td 0xffffff000649a000
kdb_enter() at kdb_enter+0x3d
panic() at panic+0x17b
xprt_unregister_locked() at xprt_unregister_locked+0xbe
xprt_unregister() at xprt_unregister+0x2c
svc_run_internal() at svc_run_internal+0x42f
svc_thread_start() at svc_thread_start+0xb
fork_exit() at fork_exit+0x12a
fork_trampoline() at fork_trampoline+0xe
--- trap 0xc, rip = 0x800695c4c, rsp = 0x7fffffffe8e8, rbp = 0 ---
db> ps
  pid  ppid  pgrp   uid   state   wmesg         wchan        cmd
[ ... ]
  920   919   919     0  R       (threaded)                  nfsd
[ ... ]
db> panic
< machine hangs hard and needs to be power cycled >
----------

Unfortunately, whenever I attempt to get the system to do a kernel core
dump, it simply hangs...

Even if I panic the machine by sending a break it doesn't work:

----------
db> cont
Uptime: 10m22s
Physical memory: 3056 MB
Dumping 252 MB: 237 221 205 189 173 157 141Error dumping block 0x0

** DUMP FAILED (ERROR 5) **
aac0: shutting down controller...FAILED.
----------

I've done some searching through the archives, but can't find anything
useful.  Does anyone have any clues for me on:

1) How to get a kernel crash dump out of KDB in 8-CURRENT at the moment?

2) What the problem with nfsd is?

Thanks,

Matthew
Received on Mon Mar 23 2009 - 13:56:37 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:44 UTC