Re: Kqueue races causing crashes

From: Konstantin Belousov <kostikbel_at_gmail.com> Date: Wed, 15 Jun 2016 20:45:24 +0300 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:05 UTC

On Wed, Jun 15, 2016 at 10:39:42AM -0700, Matthew Macy wrote:
> 
>         
> 
>         
>             You can use dwarf4 if you use GDB from ports
How would it help ?

Problem for kgdb is that %rip is zero, due to function pointer being set
to NULL in a destroyed knlist.  Either version of kgdb would not find
neither code nor unwind annotations for zero address.

But the issue is understood and we are working on the version of fix.

 ---- On Wed, 15 Jun 2016 04:50:00 -0700  Peter Holm<peter_at_holm.cc> wrote ----On Wed, Jun 15, 2016 at 11:11:43AM +0300, Konstantin Belousov wrote: > On Tue, Jun 14, 2016 at 10:26:14PM -0500, Eric Badger wrote: > > I believe they all have more or less the same cause. The crashes occur  > > because we acquire a knlist lock via the KN_LIST_LOCK macro, but when we  > > call KN_LIST_UNLOCK, the knote???s knlist reference (kn->kn_knlist) has  > > been cleared by another thread. Thus we are unable to unlock the  > > previously acquired lock and hold it until something causes us to crash  > > (such as the witness code noticing that we???re returning to userland with  > > the lock still held). > ... > > I believe there???s also a small window where the KN_LIST_LOCK macro  > > checks kn->kn_knlist and finds it to be non-NULL, but by the time it  > > actually dereferences it, it has become NULL. This would produce the  > > ???page fault while in kernel mode??? crash. > >  > > If someone familiar with this code sees an obvious fix, I???ll be happy to  > > test it. Otherwise, I???d appreciate any advice on fixing this. My first  > > thought is that a ???struct knote??? ought to have its own mutex for  > > controlling access to the flag fields and ideally the ???kn_knlist??? field.  > > I.e., you would first acquire a knote???s lock and then the knlist lock,  > > thus ensuring that no one could clear the kn_knlist variable while you  > > hold the knlist lock. The knlist lock, however, usually comes from  > > whichever event producing entity the knote tracks, so getting lock  > > ordering right between the per-knote mutex and this other lock seems  > > potentially hard. (Sometimes we call into functions in kern_event.c with  > > the knlist lock already held, having been acquired in code outside of  > > kern_event.c. Consider, for example, calling KNOTE_LOCKED from  > > kern_exit.c; the PROC_LOCK macro has already been used to acquire the  > > process lock, also serving 
>         
>         
> 
>     
>     
>