Re: kernel pointer polka, possibly by mount_nfs

From: Don Lewis <truckman_at_FreeBSD.org> Date: Wed, 10 Dec 2003 22:49:13 -0800 (PST) · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:33 UTC

On 10 Dec, Poul-Henning Kamp wrote:
> 
> I have a 100% reproducible case here where it looks like mount_nfs
> tramples on the softc of a led(4) device.
> 
> Stock -current kernel, HZ=1000, I've added a couple of sanity-checks
> in the timeout routine of led(4) and they trigger reliably on a
> byte which should not have been zero.
> 
> In all cases so far, the currently running program is mount_nfs run
> from /etc/rc.mumble somewhere.
> 
> The machine is a Soekris 4501 booting diskless.
> 
> I have also seen a reproducible page fault panic in in_pcbremlist()
> if I put "set -x" as the second line in /etc/rc on the same machine,
> it smells the same to me.
> 
> This problem likely affects 5.2-WHATEVER as well, and could be
> responsible for other Heisenbugs, and could be considered a
> showstopper.

That sounds a somewhat like the Heisenbug I've been on the hunt for in
the last few weeks.  This one liked to munch some file system's struct
mount, or whatever structure that mnt_data was pointing to.  The system
in question typically blew up when attempting to lock mnt_lock in
vfs_busy().  The trigger appeared to be the use of read-only ext2fs. The
user who reported this problem said that the system would panic after a
few hours.  After getting the user to sprinkle KASSERT()s around, I've
pretty come to the conclusion that the bug is not in the code for the
vfs top half.  Another bit of data is that the struct mount getting
nuked doesn't appear to belong to ext2fs.  It's hard to tell whose it is
though because it gets zeroed.

I use NFS on my two -CURRENT boxes and haven't run into any problems,
and I also haven't been able to reproduce any panics with ext2fs, though
I haven't exercised that nearly as much.