RE: lockmgr panic on shutdown

From: Bruce Evans <bde_at_zeta.org.au> Date: Sun, 2 Nov 2003 17:31:03 +1100 (EST) · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:27 UTC

On Sun, 2 Nov 2003 peter.edwards_at_openet-telecom.com wrote:

> The obvious solution might be to change line 1161 of ffs_vfsops to
> pass vget() "curthread" rather than td. I assume there's a good
> reason why "thread0" is passed from boot(), but I can't see why
> that's of any use to the vnode locking.

Passing &thread0 in boot() is a quick (and not even wrong) fix for
the problem that there is no valid current process^Wthread in the
panic case.  Long ago in Net/2 (still in Lite2 for at least the
i386 version), sync() in boot() was passed the completely bogus
parameters ((struct sigcontext *)0) (instead of (p, uap, retval).
This worked to the extent that sync()'s proc pointer was not passed
further or not dereferenced.  Now there are lots of locks, and since
thread0 is never the corerect lock holder, things work at most to
the extent that sync()'s proc pointer is not passed further.
curthread is never null in -current, so upgrading to the version that
passes it (i386/i386/machdep.c 1.111 (actually passes curproc)) would
probably help in the non-panic case without increasing bugs for the
panic case.  However, passing curthread is still wrong for the panic
case due to the following complications:
- panics may occur during context switches or in other critical regions
  when curthread is not quite current.
- under SMP, curthread is per-CPU, so having it non-null doesn't really
  help.  Locks may be held by curproc's running on other CPUs, and in
  panic() it is difficult to handle the other CPUs correctly -- if you
  stop them then they won't be able to release their locks, and if you
  let them run they may run into you.  Hopefully in the case of a
  normal shutdown all the other CPUs release their locks and stop before
  the sync().

Bruce