Re: thread suspension when dumping core

From: Konstantin Belousov <kostikbel_at_gmail.com> Date: Wed, 8 Jun 2016 07:30:55 +0300 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:05 UTC

On Tue, Jun 07, 2016 at 11:19:19PM +0200, Jilles Tjoelker wrote:
> On Tue, Jun 07, 2016 at 07:01:55PM +0300, Konstantin Belousov wrote:
> > On Tue, Jun 07, 2016 at 04:24:53PM +0200, Jilles Tjoelker wrote:
> > > On Tue, Jun 07, 2016 at 07:29:56AM +0300, Konstantin Belousov wrote:
> > > > This looks as if we should not ignore suspension requests in
> > > > thread_suspend_check() completely in TDF_SBDRY case, but return either
> > > > EINTR or ERESTART (most likely ERESTART). Note that the goal of
> > > > TDF_SBDRY is to avoid suspending in the protected region, not to make an
> > > > impression that the suspension does not occur at all.
> 
> > > This looks like it would revert r246417 and re-introduce the bug fixed
> > > by it (unexpected [EINTR] and short reads/writes after stop signals).
> 
> > Well, the patch returns ERESTART and not EINTR, so the syscall should
> > be retried after all the unwinding.
> 
> That fixes the [EINTR] part of the problem but not short reads and
> writes.
Which are IMO fine, but I understand that there is too many programs which
are not aware.

> 
> > > After r246417, TDF_SBDRY is intended for sleeps that occur while holding
> > > resources such as vnode locks and are normally short but should be
> > > interruptible by fatal signals because they may occasionally be
> > > indefinitely long (such as a non-responsive NFS server).
> 
> > > It looks like yet another kind of sleep may be required, since advisory
> > > locks still hold some filesystem resources across the sleep (though not
> > > vnode locks).
> 
> > I do not think that adv locks enter sleep with any resource held which
> > would block other threads.  But I agree with the statement because the
> > lock might be granted and then the stopped thread would appear to own
> > the blocking resource.
> 
> It does not hold any resource used by normal operations, but it blocks a
> forced unmount (umount -f hangs in [purgelocks] with tmpfs in a recent
> stable/10).
What I mean, is that when the lock is granted during sleep, other threads
might (or might not) block on the lock edge coming from the suspended
thread.  Even if they do not block, it is too fine implementation detail.

> 
> If queuing is supposed to be fair, then granting the lock to the stopped
> thread is correct and aborting the sleep with [ERESTART] would break it.
> The kern_lockf.c code seems to go to some lengths to make queuing fair.
> This does not seem very important, though.
> 
> Also, restarting a locking call violates some text in POSIX XSH's fcntl
> page that the range of bytes to be locked shall be determined before the
> thread blocks (this may be affected by the current seek offset and the
> file size). I don't know whether violating this will break any
> applications. The text has the problem that there is no way to
> distinguish between a thread that is in fcntl() and has not yet blocked
> and a thread that has blocked, even though it seems intuitively clear.
Such race is self-inflicted IMO. The app assumes that it can call
blockable fcntl() and then change file offset, in other thread. How
would other thread determine that the blockable thread is blocked ?

> 
> > > We then have four kinds:
> 
> > > * uninterruptible by signals, ignores stops (default)
> > > * interruptible by signals, ignores stops (current TDF_SBDRY with
> > >   PCATCH)
> > > * interruptible by signals, freezes in place on stops (avoids
> > >   unexpected short I/O) (current PCATCH, otherwise)
> > > * interruptible by signals, fails with [ERESTART] on stops (avoids
> > >   holding resources across a stop) (new)
> 
> > > The new kind of sleep would fail with [ERESTART] only for stops, since
> > > [EINTR] should only be returned if a signal handler was called. There
> > > cannot be a signal handler since a SIGTSTP/SIGTTIN/SIGTTOU signal with a
> > > handler does not stop the process.
> 
> > And where would this new kind of sleep used ?  The advlock sleep is the one
> > place.  Does fifo sleep for reader or writer on open require this kind
> > of handling (IMO no) ?
> 
> > I think this can be relatively easily implemented with either a flag
> > for XXXsleep(9) (my older style of PBDRY) or using only the thread flag
> > (jhb' newer TDF_SBDRY approach). Probably the later should be used, for
> > consistency and easier marking of larger blocks of code.
> 
> In this case it is clear which sleep(9) calls should be affected so it
> may be better to avoid more hidden state.
In this case yes, but apparently some out-of-tree users exist. And,
the marking of the single sx_sleep() call depends on knowing our
implementation. I remember that as the arguments to change from PBDRY to
the current state setter, in NFS it is not too hard to try to enumerate
interruptible sleeps. 

> 
> I also wonder whether we may be overengineering things here. Perhaps
> the advlock sleep can simply turn off TDF_SBDRY.
Well, this was the very first patch suggested.  I would be fine with that,
but again, out-of-tree code seems to be not quite fine with that local
solution.