Re: Page fault in IFNET_WLOCK_ASSERT [if.c and pccbb.c]

From: Robert Watson <rwatson_at_FreeBSD.org>
Date: Sun, 11 Oct 2009 21:30:57 +0100 (BST)
On Sun, 11 Oct 2009, Harsha Srinath wrote:

> I'm running an updated HEAD kernel and got a page fault in 
> ifindex_alloc_locked() in if.c. I figured that the problem was caused by the 
> (pluggable) network card of my laptop and found that during the 
> initialization of the interface, cb_event_thread() takes the giant lock and 
> up the call chain in if_alloc(), we call IFNET_WLOCK() and assert on the RW 
> locks in ifindex_alloc_locked(). It is in the asset macro 
> IFNET_WLOCK_ASSERT() I see the page fault.
>
> I looked up some recent related changes and noticed the following comment in 
> one of the check-ins in- 
> http://svn.freebsd.org/viewvc/base/head/sys/net/if.c
>
> "Break out allocation of new ifindex values from if_alloc() and if_vmove(), 
> and centralize in a single function ifindex_alloc(). Assert the IFNET_WLOCK, 
> and add missing IFNET_WLOCK in if_alloc(). This does not close all known 
> races in this code."
>
> So I think I have hit one of those fault conditions.
>
> Apparently the giant lock code was removed and added back recently - 
> http://svn.freebsd.org/viewvc/base/head/sys/dev/pccbb/pccbb.c
>
> I believe that the root cause is that ifnet_rw is a non sleepable exclusive 
> RW lock and we have taken the exclusive sleep mutex Giant before that.
>
> Any pointers and suggestions are welcome.

Hi Harsha--

Giant is a bit special in that the long-term sleep code in the kernel knows to 
drop it when sleeping, and re-acquire when waking up.  So, unlike all other 
mutexes, it should be OK to hold it in this case, as Giant will simply get 
dropped if the kernel has to sleep waiting on a sleepable lock.  This is 
because, historically in FreeBSD 3.x/4.x, the kernel was protected by a single 
spinlock, which would get released whenever the kernel stopped executing, such 
as during an I/O sleep.  On the whole, Giant has disappeared from the modern 
kernel, but where it is used, it retains those curious historic properties.

To break things down a bit further, IFNET_WLOCK is, itself, a bit special: 
notice that in FreeBSD 8, it's actually two locks, a sleep lock, and a mutex, 
which must both be acquired exclusively to ensure mutual exclusion. 
if_alloc() and associated calls are also sleepable because they perform 
potentially sleeping memory allocation (M_WAITOK), so it's an invariant of any 
code calling interface allocation that it must be able to tolerate a sleep.

Do you have a copy of the stack trace and fault information handy?  In my 
experience, a NULL pointer deref or other page fault in the locking code for a 
global lock is almost always corrupted thread state, perhaps due to tripping 
over another thread having locked a corrupted/freed/uninitialized lock.  We 
might be able to track that down by tracing other threads that were in 
execution at the time of the panic.

Robert
Received on Sun Oct 11 2009 - 18:30:58 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:56 UTC