Re: (ZFS?): panic: lockmgr: locking against myself

From: Peter Schuller <peter.schuller_at_infidyne.com> Date: Tue, 1 Jan 2008 18:57:48 +0100 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:24 UTC

(quoting last post for convenience; more history at 
http://www.usenetarticles.com/thread/952336.html)

> > vnode 0xffffff00037473e0: tag devfs, type VDIR
> >   usecount 0, writecount 0, refcount 1 mountedhere 0xffffff0003745ca0
> >   flags (VV_ROOT)
> >     lock type devfs: EXCL (count 1) by thread 0xffffff00010e6680 (pid 1)
>
> Some additional facts:
>
> Looking at the printouts, there is always a sequence of three or more
> (three at least twice; more than three at least once) vrele():s of the same
> vnode, in both the successful case and the panicing case. There are no
> vrele():s of any other vnodes in either case.
>
> Inserting enter/exit debug printouts in mountcheckdirs() confirms that all
> calls occur within the bounds of a single call to mountcheckdirs(). Does
> not this imply there is some locking mismatch in the non-ZFS specific code?
> I must admit I find the locking confusing; with several locking/unlocking
> functions/macros intermixed at different levels in the callstack. My
> (incorrect) reading was that this panic should always be happening, which
> is obviously not the case.
>
> Running with vfs.zfs.debug=1 confirms that vdev_geom open/attach/detach is
> happening prior to any vrele() even in the panicing case (i.e., zfs pool
> discovery seems to complete).
>
> In the case of an expected provider not being found, vd->vdev_devid is NULL
> in vdev_geom_open(), based on the "provider not found" debug printout
> (perhaps normal?).

I *think* I just experienced the same problem on 7.0-BETA3, except the kernel 
does not have WITNESS/INVARIANTS so I just get a hack instead of a panic. I 
wanted to post with the information I have for completeness; I realize what 
follows is a bunch of anecdotal mumbo-jumbo.

The boot-up process hangs right before the would-be 'trying to mount root 
from....", after all the glabel tasting has completed.

This was on a completely different system than the one in the original post, 
but it also has root-on-zfs (this time on a 5 disk raidz2). It's a dual core 
amd64 machine with a low-end mobo and low-end SATA controllers (SiI and some 
built-in nVidia chipset).

It all started when I was booting back into FreeBSD after having Windows 
booted for a while. It wouldn't boot. If fiddled some wiht vfs.zfs.debug=1, 
removing a cd ion the drive (in case it affected timing), but it did not 
help. I did not try the boot-7-live cd trick this time as I did originally on 
the other machine.

I looked carefully to make sure all drives were detected, including geom 
tasting on all but one of them that are in the zfs pool. The I/O indicator 
leds on the respective drives that ar part of the zfs pool did not indicate 
any I/O after the hang. I waited 5+ minutes at least once in the hope that it 
was a drive timing out.

After several attempts I turned off the machine and let it do a cold boot - at 
this point the system booted fine.

This is different from before, in that previously the behavior was seemingly 
triggered by changes in system configuration (loss of a drive, etc). This 
time it was just a reboot. I *did* touch a bunch of cables in between, and 
blew some air on components (for reasons not relating to this) which I 
originally figured could explain the problem.

Before this incident, the system has booted with root-on-zfs several times (at 
least 25, probably more like 50+) without any kind of problem, ever.

-- 
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <peter.schuller_at_infidyne.com>'
Key retrieval: Send an E-Mail to getpgpkey_at_scode.org
E-Mail: peter.schuller_at_infidyne.com Web: http://www.scode.org