Re: Kqueue races causing crashes

From: Konstantin Belousov <kostikbel_at_gmail.com>
Date: Wed, 15 Jun 2016 11:11:43 +0300
On Tue, Jun 14, 2016 at 10:26:14PM -0500, Eric Badger wrote:
> I believe they all have more or less the same cause. The crashes occur 
> because we acquire a knlist lock via the KN_LIST_LOCK macro, but when we 
> call KN_LIST_UNLOCK, the knote???s knlist reference (kn->kn_knlist) has 
> been cleared by another thread. Thus we are unable to unlock the 
> previously acquired lock and hold it until something causes us to crash 
> (such as the witness code noticing that we???re returning to userland with 
> the lock still held).
...
> I believe there???s also a small window where the KN_LIST_LOCK macro 
> checks kn->kn_knlist and finds it to be non-NULL, but by the time it 
> actually dereferences it, it has become NULL. This would produce the 
> ???page fault while in kernel mode??? crash.
> 
> If someone familiar with this code sees an obvious fix, I???ll be happy to 
> test it. Otherwise, I???d appreciate any advice on fixing this. My first 
> thought is that a ???struct knote??? ought to have its own mutex for 
> controlling access to the flag fields and ideally the ???kn_knlist??? field. 
> I.e., you would first acquire a knote???s lock and then the knlist lock, 
> thus ensuring that no one could clear the kn_knlist variable while you 
> hold the knlist lock. The knlist lock, however, usually comes from 
> whichever event producing entity the knote tracks, so getting lock 
> ordering right between the per-knote mutex and this other lock seems 
> potentially hard. (Sometimes we call into functions in kern_event.c with 
> the knlist lock already held, having been acquired in code outside of 
> kern_event.c. Consider, for example, calling KNOTE_LOCKED from 
> kern_exit.c; the PROC_LOCK macro has already been used to acquire the 
> process lock, also serving as the knlist lock).
This sounds as a good and correct analysis. I tried your test program
for around a hour on 8-threads machine, but was not able to trigger the
issue. Might be Peter have better luck reproducing them. Still, I think
that the problem is there.

IMO we should simply avoid clearing kn_knlist in knlist_remove().  The
member is only used to get the locking function pointers, otherwise
code relies on KN_DETACHED flag to detect on-knlist condition.  See
the patch below.

> 
> Apropos of the knlist lock and its provenance: why is a lock from the 
> event producing entity used to control access to the knlist and knote? 
> Is it generally desirable to, for example, hold the process lock while 
> operating on a knlist attached to that process? It???s not obvious to me 
> that this is required or even desirable. This might suggest that a 
> knlist should have its own lock rather than using a lock from the event 
> producing entity, which might make addressing this problem more 
> straightforward.

Consider the purpose of knlist. It serves as a container for all knotes
registered on the given subsystem object, like all knotes of the socket,
process etc which must be fired on event. See the knote() code. The
consequence is that the subsystem which fires knote() typically already
holds a lock protecting its own state. As result, it is natural to
protect the list of the knotes to activate on subsystem event, by the
subsystem lock.

diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c
index 0614903..3f45dca 100644
--- a/sys/kern/kern_event.c
+++ b/sys/kern/kern_event.c
_at__at_ -1341,11 +1341,12 _at__at_ findkn:
 	}
 
 	/*
-	 * We can get here with kn->kn_knlist == NULL.  This can happen when
-	 * the initial attach event decides that the event is "completed" 
-	 * already.  i.e. filt_procattach is called on a zombie process.  It
-	 * will call filt_proc which will remove it from the list, and NULL
-	 * kn_knlist.
+	 * We can get here with kn->kn_knlist == NULL or the knote
+	 * detached.  This can happen when the initial attach event
+	 * decides that the event is "completed" already,
+	 * i.e. filt_procattach is called on a zombie process.  It
+	 * will call filt_proc which will not add it to the list, and
+	 * leave NULL kn_knlist.
 	 */
 done_ev_add:
 	if ((kev->flags & EV_ENABLE) != 0)
_at__at_ -2073,7 +2075,6 _at__at_ knlist_remove_kq(struct knlist *knl, struct knote *kn, int knlislocked, int kqis
 	if (!knlislocked)
 		knl->kl_lock(knl->kl_lockarg);
 	SLIST_REMOVE(&knl->kl_list, kn, knote, kn_selnext);
-	kn->kn_knlist = NULL;
 	if (!knlislocked)
 		knl->kl_unlock(knl->kl_lockarg);
 	if (!kqislocked)
Received on Wed Jun 15 2016 - 06:11:56 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:05 UTC