[Fwd: Re: 6-STABLE filesystem related panics/locks (kgdb output)]

From: Eric Anderson <anderson_at_centtech.com>
Date: Mon, 18 Sep 2006 20:39:37 -0500
No response on hackers_at_, so I'm sending here too.

Also, this machine just recently entered this state again.  I can get 
into the debugger in the morning (6am central time), if anyone has any 
suggestions or info I should get from it.


Eric


On 09/18/06 10:02, Eric Anderson wrote:
> Hi all,
> 
> On one of our NFS servers, we've seen repeated filesystem issues with 
> two of the filesystems (it has 4 exported via NFS).  It usually 
> manifests itself by a hung 'df -lk' (wedged in 'ufs'), and mountd 
> becomes wedged also, not allowing new mounts, and unable to be killed. 
>  From an NFS client, one can continue using the filesystem just fine, 
> without an issue.  From the server itself, you can cd to the 
> filesystem's root directory, but an ls will hang.  Running a background 
> fsck on that filesystem while in this state also blocks on ufs.  My nfsd 
> processes with also get stuck in the 'D' state (in 'ufs'), but they 
> still appear to be serving data. About a month ago, I brought the system 
> down, did a full fsck on all the filesystems, and brought it back up. 
> It survived for several weeks (2-3), but is now doing the same thing, so 
> I'm uncertain if the issue was affected by the fsck at all (doubtful).
> 
> This morning, prior to rebooting the system to get it out of this state, 
> I began unmounting filesystems in case of a panic, and after unmounting 
> (successfully) two of the filesystems (the ones I've never seen an issue 
> on), I tried unmounting the third (/scr02), and a panic ensued.  /scr01 
> is the other filesystem that is giving me issues.
> 
> Some information about the system/setup:
> 
> FreeBSD smd2.centtech.com 6.1-STABLE FreeBSD 6.1-STABLE #0: Sat Aug 12 
> 13:24:02 CDT 2006
> 
> # df -ilk
> Filesystem     1K-blocks      Used     Avail Capacity iused    ifree 
> %iused  Mounted on
> /dev/amrd0s1a   20308398   3098864  15584864    17%  259261  2378561 
> 10%   /
> devfs                  1         1         0   100%       0        0 
> 100%   /dev
> /dev/amrd0s1d   13065232   3960250   8059764    33%     870  1694872 
> 0%   /var
> /dev/ufs/rss   213268540  93886480 102320578    48%  399297 27180093 
> 1%   /rss
> /dev/ufs/scr02 213268540 116904962  79302096    60%  426573 27152817 
> 2%   /scr02
> /dev/ufs/scr04 167568544  93374026  60789036    61%   13008 21654830 
> 0%   /scr04
> /dev/ufs/scr01 232100360 161547746  51984586    76%  531834 29473412 
> 2%   /scr01
> 
> (rss and scr04 never give me any issues)
> 
> All four of the ufs/* partitions are on the same RAID array, and I don't 
> believe there is any underlying disk issue.
> 
> Here's some kgdb output from when the system was wedged on /scr01, but 
> the unmount of /scr02 caused a panic:
> 
> # kgdb -q -n 3
> [GDB will not be able to debug user-mode threads: 
> /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
> 
> Unread portion of the kernel message buffer:
> Mount point /scr02 had 1 dangling refs
> panic: unmount: dangling vnode
> cpuid = 0
> KDB: enter: panic
> Dumping 1023 MB (2 chunks)
>    chunk 0: 1MB (159 pages) ... ok
>    chunk 1: 1023MB (261824 pages) 1007 991 975 959 943 927 911 895 879 
> 863 847 831 815 799 783 767 751 735 719 703 687 671 655 639 623 607 591 
> 575 5
> 59 543 527 511 495 479 463 447 431 415 399 383 367 351 335 319 303 287 
> 271 255 239 223 207 191 175 159 143 127 111 95 79 63 47 31 15
> 
> #0  doadump () at pcpu.h:165
> 165     pcpu.h: No such file or directory.
>          in pcpu.h
> (kgdb) bt
> #0  doadump () at pcpu.h:165
> #1  0xc0473b9b in db_fncall (dummy1=-1063129632, dummy2=0, 
> dummy3=-1064859081, dummy4=0xe8de3ab8 "ä:Þè\234l\207ÀÐ:ÞèÔ:Þè\220\a")
>      at /usr/src/sys/ddb/db_command.c:492
> #2  0xc04739a0 in db_command (last_cmdp=0xc09d0144, cmd_table=0x0, 
> aux_cmd_tablep=0xc092fe4c, aux_cmd_tablep_end=0xc092fe68)
>      at /usr/src/sys/ddb/db_command.c:350
> #3  0xc0473a68 in db_command_loop () at /usr/src/sys/ddb/db_command.c:458
> #4  0xc0475679 in db_trap (type=3, code=0) at /usr/src/sys/ddb/db_main.c:221
> #5  0xc0697c0c in kdb_trap (type=3, code=0, tf=0xe8de3bfc) at 
> /usr/src/sys/kern/subr_kdb.c:473
> #6  0xc0896338 in trap (frame=
>        {tf_fs = -388104184, tf_es = -1066860504, tf_ds = -1064304600, 
> tf_edi = -1064235220, tf_esi = 1, tf_ebp = -388088772, tf_isp = 
> -388088792, tf
> _ebx = -388088728, tf_edx = 0, tf_ecx = -1056755712, tf_eax = 18, 
> tf_trapno = 3, tf_err = 0, tf_eip = -1066829453, tf_cs = 32, tf_eflags = 
> 646, tf_
> esp = -388088740, tf_ss = -1066934521}) at /usr/src/sys/i386/i386/trap.c:593
> #7  0xc0881e5a in calltrap () at /usr/src/sys/i386/i386/exception.s:139
> #8  0xc0697973 in kdb_enter (msg=0x12 <Address 0x12 out of bounds>) at 
> cpufunc.h:60
> #9  0xc067df07 in panic (fmt=0xc0910f2c "unmount: dangling vnode") at 
> /usr/src/sys/kern/kern_shutdown.c:549
> #10 0xc06d153e in vfs_mount_destroy (mp=0xc5964000, td=0xc620c600) at 
> /usr/src/sys/kern/vfs_mount.c:514
> #11 0xc06d2d26 in dounmount (mp=0xc5964000, flags=134217728, 
> td=0xc620c600) at /usr/src/sys/kern/vfs_mount.c:1162
> #12 0xc06d27de in unmount (td=0xc620c600, uap=0xe8de3d04) at 
> /usr/src/sys/kern/vfs_mount.c:1052
> #13 0xc0896c0b in syscall (frame=
>        {tf_fs = 59, tf_es = 59, tf_ds = 59, tf_edi = 134521957, tf_esi = 
> 134535289, tf_ebp = -1077942776, tf_isp = -388088476, tf_ebx = -1077942864,
>   tf_edx = 26, tf_ecx = 0, tf_eax = 22, tf_trapno = 12, tf_err = 2, 
> tf_eip = 671864503, tf_cs = 51, tf_eflags = 518, tf_esp = -1077942948, 
> tf_ss = 5
> 9}) at /usr/src/sys/i386/i386/trap.c:981
> #14 0xc0881eaf in Xint0x80_syscall () at 
> /usr/src/sys/i386/i386/exception.s:200
> #15 0x00000033 in ?? ()
> Previous frame inner to this frame (corrupt stack?)
> (kgdb) frame 10
> #10 0xc06d153e in vfs_mount_destroy (mp=0xc5964000, td=0xc620c600) at 
> /usr/src/sys/kern/vfs_mount.c:514
> 514                     panic("unmount: dangling vnode");
> (kgdb) l
> 509                     printf("mount point secondary write ops 
> completed\n");
> 510             }
> 511             MNT_IUNLOCK(mp);
> 512             mp->mnt_vfc->vfc_refcount--;
> 513             if (!TAILQ_EMPTY(&mp->mnt_nvnodelist))
> 514                     panic("unmount: dangling vnode");
> 515             lockdestroy(&mp->mnt_lock);
> 516             MNT_ILOCK(mp);
> 517             if (mp->mnt_kern_flag & MNTK_MWAIT)
> 518                     wakeup(mp);
> 
> (kgdb) p *mp
> $2 = {mnt_list = {tqe_next = 0xc5964400, tqe_prev = 0xc59bbc00}, mnt_op 
> = 0xc09b96e0, mnt_vfc = 0xc09b9720, mnt_vnodecovered = 0xc5ae0cc0,
>    mnt_syncer = 0x0, mnt_nvnodelist = {tqh_first = 0xc6d59440, tqh_last 
> = 0xc6d59454}, mnt_lock = {lk_interlock = 0xc09eac84, lk_flags = 1048576,
>      lk_sharecount = 0, lk_waitcount = 0, lk_exclusivecount = 0, lk_prio 
> = 80, lk_wmesg = 0xc0910dff "vfslock", lk_timo = 0,
>      lk_lockholder = 0xffffffff, lk_newlock = 0x0}, mnt_mtx = 
> {mtx_object = {lo_class = 0xc0980124, lo_name = 0xc0910dee "struct mount 
> mtx",
>        lo_type = 0xc0910dee "struct mount mtx", lo_flags = 196608, 
> lo_list = {tqe_next = 0x0, tqe_prev = 0x0}, lo_witness = 0x0}, mtx_lock 
> = 4,
>      mtx_recurse = 0}, mnt_writeopcount = 0, mnt_flag = 2097920, mnt_opt 
> = 0xc5926a00, mnt_optnew = 0x0, mnt_kern_flag = 553648128,
>    mnt_maxsymlinklen = 120, mnt_stat = {f_version = 537068824, f_type = 
> 5, f_flags = 2102016, f_bsize = 2048, f_iosize = 16384,
>      f_blocks = 106634270, f_bfree = 48180134, f_bavail = 39649393, 
> f_files = 27579390, f_ffree = 27152822, f_syncwrites = 0, f_asyncwrites 
> = 0,
>      f_syncreads = 0, f_asyncreads = 0, f_spare = {0, 0, 0, 0, 0, 0, 0, 
> 0, 0, 0}, f_namemax = 255, f_owner = 0, f_fsid = {val = {1111508928,
>          -571478071}}, f_charspare = '\0' <repeats 79 times>, 
> f_fstypename = "ufs", '\0' <repeats 12 times>,
>      f_mntfromname = "/dev/ufs/scr02", '\0' <repeats 73 times>, 
> f_mntonname = "/scr02", '\0' <repeats 81 times>}, mnt_cred = 0xc59f2080,
>    mnt_data = 0x0, mnt_time = 0, mnt_iosize_max = 131072, mnt_export = 
> 0xc5d25c00, mnt_mntlabel = 0x0, mnt_fslabel = 0x0, mnt_nvnodelistsize = 1,
>    mnt_hashseed = 3369618744, mnt_markercnt = 0, mnt_holdcnt = 0, 
> mnt_holdcntwaiters = 0, mnt_secondary_writes = 0,
>    mnt_secondary_accwrites = 2126786, mnt_ref = 1}
> (kgdb) p mp->mnt_vfc->vfc_refcount
> $3 = 4
> 
> 
> Anything else I can provide to help find the issue?
> 
> 
> Eric
> 
> 
> 


Another batch of kgdb output from this same system, with the same issue:

# kgdb -q -n 1
[GDB will not be able to debug user-mode threads:
/usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]

Unread portion of the kernel message buffer:
Mount point /rss had 1 dangling refs
panic: unmount: dangling vnode
cpuid = 0
KDB: enter: panic
Dumping 1023 MB (2 chunks)
    chunk 0: 1MB (159 pages) ... ok
    chunk 1: 1023MB (261824 pages) 1007 991 975 959 943 927 911 895 879
863 847 831 815 799 783 767 751 735 719 703 687 671 655 639 623 607 591
575 559 543 527 511 495 479 463 447 431 415 399 383 367 351 335 319 303
287 271 255 239 223 207 191 175 159 143 127 111 95 79 63 47 31 15

#0  doadump () at pcpu.h:165
165     pcpu.h: No such file or directory.
          in pcpu.h
(kgdb) where
#0  doadump () at pcpu.h:165
#1  0xc0473b9b in db_fncall (dummy1=-1063129632, dummy2=0,
dummy3=-1064859081, dummy4=0xe8e65ab8 "äZæè\234l\207ÀÐZæèÔZæè\220\a")
      at /usr/src/sys/ddb/db_command.c:492
#2  0xc04739a0 in db_command (last_cmdp=0xc09d0144, cmd_table=0x0,
aux_cmd_tablep=0xc092fe4c, aux_cmd_tablep_end=0xc092fe68)
      at /usr/src/sys/ddb/db_command.c:350
#3  0xc0473a68 in db_command_loop () at /usr/src/sys/ddb/db_command.c:458
#4  0xc0475679 in db_trap (type=3, code=0) at /usr/src/sys/ddb/db_main.c:221
#5  0xc0697c0c in kdb_trap (type=3, code=0, tf=0xe8e65bfc) at
/usr/src/sys/kern/subr_kdb.c:473
#6  0xc0896338 in trap (frame=
        {tf_fs = -387579896, tf_es = -1066860504, tf_ds = -1064304600,
tf_edi = -1064235220, tf_esi = 1, tf_ebp = -387556292, tf_isp =
-387556312, tf_ebx = -387556248, tf_edx = 0, tf_ecx = -1056755712,
tf_eax = 18, tf_trapno = 3, tf_err = 0, tf_eip = -1066829453, tf_cs =
32, tf_eflags = 646, tf_esp = -387556260, tf_ss = -1066934521}) at
/usr/src/sys/i386/i386/trap.c:593
#7  0xc0881e5a in calltrap () at /usr/src/sys/i386/i386/exception.s:139
#8  0xc0697973 in kdb_enter (msg=0x12 <Address 0x12 out of bounds>) at
cpufunc.h:60
#9  0xc067df07 in panic (fmt=0xc0910f2c "unmount: dangling vnode") at
/usr/src/sys/kern/kern_shutdown.c:549
#10 0xc06d153e in vfs_mount_destroy (mp=0xc59bbc00, td=0xc5c16000) at
/usr/src/sys/kern/vfs_mount.c:514
#11 0xc06d2d26 in dounmount (mp=0xc59bbc00, flags=134217728,
td=0xc5c16000) at /usr/src/sys/kern/vfs_mount.c:1162
#12 0xc06d27de in unmount (td=0xc5c16000, uap=0xe8e65d04) at
/usr/src/sys/kern/vfs_mount.c:1052
#13 0xc0896c0b in syscall (frame=
        {tf_fs = 59, tf_es = 59, tf_ds = 59, tf_edi = 134521957, tf_esi =
134534817, tf_ebp = -1077942776, tf_isp = -387555996, tf_ebx =
-1077942864, tf_edx = 25, tf_ecx = 0, tf_eax = 22, tf_trapno = 12,
tf_err = 2, tf_eip = 671864503, tf_cs = 51, tf_eflags = 518, tf_esp =
-1077942948, tf_ss = 59}) at /usr/src/sys/i386/i386/trap.c:981
#14 0xc0881eaf in Xint0x80_syscall () at
/usr/src/sys/i386/i386/exception.s:200
#15 0x00000033 in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb) frame 10
#10 0xc06d153e in vfs_mount_destroy (mp=0xc59bbc00, td=0xc5c16000) at
/usr/src/sys/kern/vfs_mount.c:514
514                     panic("unmount: dangling vnode");
(kgdb) l
509                     printf("mount point secondary write ops
completed\n");
510             }
511             MNT_IUNLOCK(mp);
512             mp->mnt_vfc->vfc_refcount--;
513             if (!TAILQ_EMPTY(&mp->mnt_nvnodelist))
514                     panic("unmount: dangling vnode");
515             lockdestroy(&mp->mnt_lock);
516             MNT_ILOCK(mp);
517             if (mp->mnt_kern_flag & MNTK_MWAIT)
518                     wakeup(mp);
(kgdb) p mp->mnt_vfc->vfc_refcount
$1 = 4
(kgdb) p *mp
$2 = {mnt_list = {tqe_next = 0xc595d000, tqe_prev = 0xc59bc000}, mnt_op
= 0xc09b96e0, mnt_vfc = 0xc09b9720, mnt_vnodecovered = 0xc5a81cc0,
    mnt_syncer = 0x0, mnt_nvnodelist = {tqh_first = 0xc8af4000, tqh_last
= 0xc8af4014}, mnt_lock = {lk_interlock = 0xc09eac18, lk_flags = 1048576,
      lk_sharecount = 0, lk_waitcount = 0, lk_exclusivecount = 0, lk_prio
= 80, lk_wmesg = 0xc0910dff "vfslock", lk_timo = 0,
      lk_lockholder = 0xffffffff, lk_newlock = 0x0}, mnt_mtx =
{mtx_object = {lo_class = 0xc0980124, lo_name = 0xc0910dee "struct mount
mtx",
        lo_type = 0xc0910dee "struct mount mtx", lo_flags = 196608,
lo_list = {tqe_next = 0x0, tqe_prev = 0x0}, lo_witness = 0x0}, mtx_lock
= 4,
      mtx_recurse = 0}, mnt_writeopcount = 0, mnt_flag = 2097920, mnt_opt
= 0xc5728a40, mnt_optnew = 0x0, mnt_kern_flag = 553648128,
    mnt_maxsymlinklen = 120, mnt_stat = {f_version = 537068824, f_type =
5, f_flags = 2102016, f_bsize = 2048, f_iosize = 16384,
      f_blocks = 106634270, f_bfree = 65410962, f_bavail = 56880221,
f_files = 27579390, f_ffree = 27203064, f_syncwrites = 0, f_asyncwrites
= 0,
      f_syncreads = 0, f_asyncreads = 0, f_spare = {0, 0, 0, 0, 0, 0, 0,
0, 0, 0}, f_namemax = 255, f_owner = 0, f_fsid = {val = {1111508926,
          499625180}}, f_charspare = '\0' <repeats 79 times>,
f_fstypename = "ufs", '\0' <repeats 12 times>,
      f_mntfromname = "/dev/ufs/rss", '\0' <repeats 75 times>,
f_mntonname = "/rss", '\0' <repeats 83 times>}, mnt_cred = 0xc5a56c80,
    mnt_data = 0x0, mnt_time = 0, mnt_iosize_max = 131072, mnt_export =
0xc59e8000, mnt_mntlabel = 0x0, mnt_fslabel = 0x0, mnt_nvnodelistsize = 1,
    mnt_hashseed = 2115021039, mnt_markercnt = 0, mnt_holdcnt = 0,
mnt_holdcntwaiters = 0, mnt_secondary_writes = 0,
    mnt_secondary_accwrites = 12553194, mnt_ref = 1}
(kgdb) p &mp->mnt_nvnodelist
$3 = (struct vnodelst *) 0xc59bbc18
(kgdb) p mp->mnt_nvnodelist
$4 = {tqh_first = 0xc8af4000, tqh_last = 0xc8af4014}


Eric



-- 
------------------------------------------------------------------------
Eric Anderson        Sr. Systems Administrator        Centaur Technology
Anything that works is better than anything that doesn't.
------------------------------------------------------------------------
_______________________________________________
freebsd-hackers_at_freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscribe_at_freebsd.org"

-- 
------------------------------------------------------------------------
Eric Anderson        Sr. Systems Administrator        Centaur Technology
Anything that works is better than anything that doesn't.
------------------------------------------------------------------------
Received on Mon Sep 18 2006 - 23:39:27 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:00 UTC