Re: panic: LK_RETRY set with incompatible flags

From: Sergey Kandaurov <pluknet_at_gmail.com>
Date: Mon, 4 Feb 2013 14:49:04 +0300
On 4 February 2013 05:07, Rick Macklem <rmacklem_at_uoguelph.ca> wrote:
> Andriy Gapon wrote:
>> on 03/02/2013 18:36 Rick Macklem said the following:
>> > I can think of two possibilities:
>> > 1 - ZFS isn't setting VV_ROOT on the root vnode under some
>> > condition.
>> > or
>> > 2 - The vnode was left locked from some previous operation that
>> > happened
>> >     to be done by this thread. Doesn't seem likely, but???
>> >
>> > Maybe Sergey could try the change to line#1451 and see if the panic
>> > still happens. If not, that would suggest possibility #1, I think.
>>
>> If the kernel is configured with witness, then it should be easy to
>> check where
>> the exclusive lock was taken (file and line number).
>>
> Yep. If Sergey can reproduce this using a kernel with witness,
> doing "show witness" to see where the lock on the directory vnode
> was acquired, could be helpful.

Hi, Rick! Here is the requested info regarding witness, and a bit more.
The triggered KASSERT is now different though.

Full witness is at http://people.freebsd.org/~pluknet/witness-zfs-20130204.txt

shared lock of (lockmgr) zfs _at_
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:1452
while exclusively locked from
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1747
panic: share->excl
cpuid = 2
KDB: enter: panic
[ thread pid 812 tid 100884 ]
Stopped at      kdb_enter+0x3e: movq    $0,kdb_why

The 1st line is at zfs_lookup():
        if (error == 0 && (nm[0] != '.' || nm[1] != '\0')) {
                int ltype = 0;

                if (cnp->cn_flags & ISDOTDOT) {
                        ltype = VOP_ISLOCKED(dvp);
                        VOP_UNLOCK(dvp, 0);
                }
                ZFS_EXIT(zfsvfs);
                error = zfs_vnode_lock(*vpp, cnp->cn_lkflags);
                if (cnp->cn_flags & ISDOTDOT)
==>                     vn_lock(dvp, ltype | LK_RETRY);
                if (error != 0) {
                        VN_RELE(*vpp);
                        *vpp = NULL;
                        return (error);
                }
        } else {
                ZFS_EXIT(zfsvfs);
        }

The 2nd line is at zfs_vnode_lock():
int
zfs_vnode_lock(vnode_t *vp, int flags)
{
        int error;

        ASSERT(vp != NULL);

        error = vn_lock(vp, flags);
        return (error);
}

db> show locks
exclusive lockmgr zfs (zfs) r = 0 (0xfffffe00a1b44240) locked _at_
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1747
db> show alllocks
Process 812 (nfsd) thread 0xfffffe00a1198000 (100884)
exclusive lockmgr zfs (zfs) r = 0 (0xfffffe00a1b44240) locked _at_
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1747
Process 750 (syslogd) thread 0xfffffe0015a4c480 (100706)
exclusive lockmgr ufs (ufs) r = 0 (0xfffffe00a1962d50) locked _at_
/usr/src/sys/kern/vfs_syscalls.c:3433
Process 12 (intr) thread 0xfffffe0006813000 (100033)
exclusive sleep mutex AAC I/O lock (AAC I/O lock) r = 0
(0xffffff8001bfb210) locked _at_ /usr/src/sys/dev/aac/aac.c:827

db> show lock 0xfffffe00a1b44240
 class: lockmgr
 name: zfs
 state: XLOCK: 0xfffffe00a1198000 (tid 100884, pid 812, "nfsd")
 waiters: none
 spinners: none

As KASSERT is different:

db> bt
Tracing pid 812 tid 100884 td 0xfffffe00a1198000
kdb_enter() at kdb_enter+0x3e/frame 0xffffff848e6bfd60
vpanic() at vpanic+0x147/frame 0xffffff848e6bfda0
kassert_panic() at kassert_panic+0x136/frame 0xffffff848e6bfe10
witness_checkorder() at witness_checkorder+0x289/frame 0xffffff848e6bfe90
__lockmgr_args() at __lockmgr_args+0x43e/frame 0xffffff848e6bffc0
vop_stdlock() at vop_stdlock+0x3c/frame 0xffffff848e6bffe0
VOP_LOCK1_APV() at VOP_LOCK1_APV+0xd0/frame 0xffffff848e6c0000
_vn_lock() at _vn_lock+0xab/frame 0xffffff848e6c0070
zfs_lookup() at zfs_lookup+0x392/frame 0xffffff848e6c0100
zfs_freebsd_lookup() at zfs_freebsd_lookup+0x6d/frame 0xffffff848e6c0240
VOP_CACHEDLOOKUP_APV() at VOP_CACHEDLOOKUP_APV+0xc2/frame 0xffffff848e6c0260
vfs_cache_lookup() at vfs_cache_lookup+0xcf/frame 0xffffff848e6c02b0
VOP_LOOKUP_APV() at VOP_LOOKUP_APV+0xc2/frame 0xffffff848e6c02d0
lookup() at lookup+0x548/frame 0xffffff848e6c0350
nfsvno_namei() at nfsvno_namei+0x1a5/frame 0xffffff848e6c0400
nfsrvd_lookup() at nfsrvd_lookup+0x13a/frame 0xffffff848e6c06b0
nfsrvd_dorpc() at nfsrvd_dorpc+0xca5/frame 0xffffff848e6c08a0
nfssvc_program() at nfssvc_program+0x482/frame 0xffffff848e6c0a00
svc_run_internal() at svc_run_internal+0x1e9/frame 0xffffff848e6c0ba0
svc_thread_start() at svc_thread_start+0xb/frame 0xffffff848e6c0bb0
fork_exit() at fork_exit+0x84/frame 0xffffff848e6c0bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xffffff848e6c0bf0
--- trap 0xc, rip = 0x800883b7a, rsp = 0x7fffffffd6c8, rbp = 0x7fffffffd970 ---

-- 
wbr,
pluknet
Received on Mon Feb 04 2013 - 10:49:10 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:34 UTC