On Sun, 11 Oct 2009, Harsha Srinath wrote: > I'm running an updated HEAD kernel and got a page fault in > ifindex_alloc_locked() in if.c. I figured that the problem was caused by the > (pluggable) network card of my laptop and found that during the > initialization of the interface, cb_event_thread() takes the giant lock and > up the call chain in if_alloc(), we call IFNET_WLOCK() and assert on the RW > locks in ifindex_alloc_locked(). It is in the asset macro > IFNET_WLOCK_ASSERT() I see the page fault. > > I looked up some recent related changes and noticed the following comment in > one of the check-ins in- > http://svn.freebsd.org/viewvc/base/head/sys/net/if.c > > "Break out allocation of new ifindex values from if_alloc() and if_vmove(), > and centralize in a single function ifindex_alloc(). Assert the IFNET_WLOCK, > and add missing IFNET_WLOCK in if_alloc(). This does not close all known > races in this code." > > So I think I have hit one of those fault conditions. > > Apparently the giant lock code was removed and added back recently - > http://svn.freebsd.org/viewvc/base/head/sys/dev/pccbb/pccbb.c > > I believe that the root cause is that ifnet_rw is a non sleepable exclusive > RW lock and we have taken the exclusive sleep mutex Giant before that. > > Any pointers and suggestions are welcome. Hi Harsha-- Giant is a bit special in that the long-term sleep code in the kernel knows to drop it when sleeping, and re-acquire when waking up. So, unlike all other mutexes, it should be OK to hold it in this case, as Giant will simply get dropped if the kernel has to sleep waiting on a sleepable lock. This is because, historically in FreeBSD 3.x/4.x, the kernel was protected by a single spinlock, which would get released whenever the kernel stopped executing, such as during an I/O sleep. On the whole, Giant has disappeared from the modern kernel, but where it is used, it retains those curious historic properties. To break things down a bit further, IFNET_WLOCK is, itself, a bit special: notice that in FreeBSD 8, it's actually two locks, a sleep lock, and a mutex, which must both be acquired exclusively to ensure mutual exclusion. if_alloc() and associated calls are also sleepable because they perform potentially sleeping memory allocation (M_WAITOK), so it's an invariant of any code calling interface allocation that it must be able to tolerate a sleep. Do you have a copy of the stack trace and fault information handy? In my experience, a NULL pointer deref or other page fault in the locking code for a global lock is almost always corrupted thread state, perhaps due to tripping over another thread having locked a corrupted/freed/uninitialized lock. We might be able to track that down by tracing other threads that were in execution at the time of the panic. RobertReceived on Sun Oct 11 2009 - 18:30:58 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:56 UTC