Re: thread suspension when dumping core

From: Konstantin Belousov <kostikbel_at_gmail.com>
Date: Tue, 7 Jun 2016 05:46:10 +0300
On Mon, Jun 06, 2016 at 10:13:11AM -0700, Mark Johnston wrote:
> On Sat, Jun 04, 2016 at 12:32:36PM +0300, Konstantin Belousov wrote:
> > Does your fs both set TDF_SBDRY and call lf_advlock()/lf_advlockasync() ?
> 
> It doesn't. This code belongs to a general framework for distributed FS
> locks; in this particular case, the application was using it to acquire
> a custom advisory lock.
What statement was not true: that your code sets TDF_SBDRY, or that
the lf_advlock() function was called ?

> 
> > This cannot work, regardless of the mode of single-threading.  TDF_SBDRY
> > makes such sleep non-interruptible by any single-threading request, on
> > the promise that the thread owns some resources (typically vnode locks).
> > I.e. changing the mode would not help.
> 
> I'm a bit confused by this. How does TDF_SBDRY prevent thread_single()
> from waking up the thread? The sleepq_abort() call is only elided in the
> SINGLE_ALLPROC case, so in other cases, I think we will still interrupt
> the sleep. Thus, since thread_suspend_check() is only invoked prior to
> going to sleep, the application I referred to must have attempted to
> single-thread the process before the thread in question went to sleep.
It does not prevent the wakeup, sorry.

What I should have said, more precisely, is that thread_suspend_check()
call before the thread is goes to sleep, is nop in case of TDF_SBDRY
flag was set.

> 
> > 
> > I see two reasons to use SINGLE_NO_EXIT for coredumping.  It allows
> > coredump writer to record more exact state of the process into the notes.
> > 
> > Another one is that SINGLE_NO_EXIT is generally faster and more
> > reliable than SINGLE_BOUNDARY. Some states are already good enough for
> > SINGLE_NO_EXIT, while require more work to get into SINGLE_BOUNDARY. In
> > other words, core dump write starts earlier.
> > 
> > It might be not very significant reasons. 
> > 
> > From what I see in the code, our NFS client has similar issue of calling
> > lf_advlock() with TDF_SBDRY set.  Below is the patch to fix that.
> > Similar bug existed in our fifofs, see r277321.
> 
> Thanks. It may be that a similar fix is appropriate in our locking code,
> but I'll have to spend more time reading it.

Still, I am confused now as well.  If you can catch the process in that
state, where a thread is sleeping while single-threading request cannot
make the progress, I would like to see the struct thread and struct proc
printouts.  Esp. the thread flags are interesting.

Thanks.
Received on Tue Jun 07 2016 - 00:46:21 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:05 UTC