On 10 Dec, Poul-Henning Kamp wrote: > > I have a 100% reproducible case here where it looks like mount_nfs > tramples on the softc of a led(4) device. > > Stock -current kernel, HZ=1000, I've added a couple of sanity-checks > in the timeout routine of led(4) and they trigger reliably on a > byte which should not have been zero. > > In all cases so far, the currently running program is mount_nfs run > from /etc/rc.mumble somewhere. > > The machine is a Soekris 4501 booting diskless. > > I have also seen a reproducible page fault panic in in_pcbremlist() > if I put "set -x" as the second line in /etc/rc on the same machine, > it smells the same to me. > > This problem likely affects 5.2-WHATEVER as well, and could be > responsible for other Heisenbugs, and could be considered a > showstopper. That sounds a somewhat like the Heisenbug I've been on the hunt for in the last few weeks. This one liked to munch some file system's struct mount, or whatever structure that mnt_data was pointing to. The system in question typically blew up when attempting to lock mnt_lock in vfs_busy(). The trigger appeared to be the use of read-only ext2fs. The user who reported this problem said that the system would panic after a few hours. After getting the user to sprinkle KASSERT()s around, I've pretty come to the conclusion that the bug is not in the code for the vfs top half. Another bit of data is that the struct mount getting nuked doesn't appear to belong to ext2fs. It's hard to tell whose it is though because it gets zeroed. I use NFS on my two -CURRENT boxes and haven't run into any problems, and I also haven't been able to reproduce any panics with ext2fs, though I haven't exercised that nearly as much.Received on Wed Dec 10 2003 - 21:49:22 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:33 UTC