Re: kernel pointer polka, possibly by mount_nfs

From: Don Lewis <truckman_at_FreeBSD.org>
Date: Fri, 12 Dec 2003 10:32:03 -0800 (PST)
On 12 Dec, Stefan Ehmann wrote:
> On Thu, 2003-12-11 at 07:49, Don Lewis wrote:

>> 
>> That sounds a somewhat like the Heisenbug I've been on the hunt for in
>> the last few weeks.  This one liked to munch some file system's struct
>> mount, or whatever structure that mnt_data was pointing to.  The system
>> in question typically blew up when attempting to lock mnt_lock in
>> vfs_busy().  The trigger appeared to be the use of read-only ext2fs. The
>> user who reported this problem said that the system would panic after a
>> few hours.  After getting the user to sprinkle KASSERT()s around, I've
>> pretty come to the conclusion that the bug is not in the code for the
>> vfs top half.  Another bit of data is that the struct mount getting
>> nuked doesn't appear to belong to ext2fs.  It's hard to tell whose it is
>> though because it gets zeroed.
>> 
>> I use NFS on my two -CURRENT boxes and haven't run into any problems,
>> and I also haven't been able to reproduce any panics with ext2fs, though
>> I haven't exercised that nearly as much.
> 
> I guess you are talking about my panics. Since we don't seem to make any
> progress - would it help to find out when the change that causes the
> problem was made?
> 
> I was running an end of september kernel for nearly two months without
> having panics 3 times a day. The kernel of Nov 23 had these problems. So
> the problem should be located somwhere in these two months.
> 
> Since this may take quite some time (and a lot of kernel and
> worldbuilds), I'll only take it into account if there is a good chance
> that this will reveal the source of the problem.

Unfortunately, that may be the fastest way to track down the culprit.
The only other way would be to write a more aggressive assertion checker
function that validates the integrity of all the mount structures and
sprinkle lots of calls to this function around the kernel.

I also diff'ed the 2003/09/23 and 2003/11/23 versions of the ext2fs code
and didn't see anything suspicious.  That means that either the culprit
change is something subtle in extfs, or it is elsewhere in the kernel.
Received on Fri Dec 12 2003 - 09:32:17 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:33 UTC