Re: thread suspension when dumping core

From: Jilles Tjoelker <jilles_at_stack.nl> Date: Wed, 8 Jun 2016 23:17:44 +0200 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:05 UTC

On Wed, Jun 08, 2016 at 04:56:35PM +0300, Konstantin Belousov wrote:
> On Wed, Jun 08, 2016 at 06:35:08AM -0700, Mark Johnston wrote:
> > On Wed, Jun 08, 2016 at 07:30:55AM +0300, Konstantin Belousov wrote:
> > > On Tue, Jun 07, 2016 at 11:19:19PM +0200, Jilles Tjoelker wrote:
> > > > I also wonder whether we may be overengineering things here. Perhaps
> > > > the advlock sleep can simply turn off TDF_SBDRY.
> > > Well, this was the very first patch suggested.  I would be fine with that,
> > > but again, out-of-tree code seems to be not quite fine with that local
> > > solution.

> > In our particular case, we could possibly use a similar approach. In
> > general, it seems incorrect to clear TDF_SBDRY if the thread calling
> > sx_sleep() has any locks held. It is easy to verify that all callers of
> > lf_advlock() are safe in this respect, but this kind of auditing is
> > generally hard. In fact, I believe the sx_sleep that led to the problem
> > described in D2612 is the same as the one in my case. That is, the
> > sleeping thread may or may not hold a vnode lock depending on context.

> I do not think that in-tree code sleeps with a vnode lock held in
> the lf_advlock().  Otherwise, system would hang in lock cascade by
> an attempt to obtain an advisory lock.  I think we can even assert
> this with witness.

> There is another sleep, which Jilles mentioned, in lf_purgelocks(),
> called from vgone(). This sleep indeed occurs under the vnode lock, and
> as such must be non-suspendable. The sleep waits until other threads
> leave the lf_advlock() for the reclaimed vnode, and they should leave in
> deterministic time due to issued wakeups.  So this sleep is exempt from
> the considerations, and TDF_SBDRY there is correct.

> I am fine with either the braces around sx_sleep() in lf_advlock() to
> clear TDF_SBDRY (sigdeferstsop()), or with the latest patch I sent,
> which adds temporal override for TDF_SBDRY with TDF_SRESTART. My
> understanding is that you prefer the later. If I do not mis-represent
> your position, I understand why you do prefer that.

The TDF_SRESTART change does fix some more problems such as umount -f
getting stuck in lf_purgelocks().

However, it introduces some subtle issues that may not necessarily be a
sufficient objection.

Firstly, adding this closes the door on fixing signal handling for
fcntl(F_SETLKW). Per POSIX, any caught signal interrupts
fcntl(F_SETLKW), even if SA_RESTART is set for the signal, and the Linux
man page documents the same. Our man page has documented that SA_RESTART
behaves normally with fcntl(F_SETLKW) since at least FreeBSD 2.0. This
could normally be fixed via  if (error == ERESTART) error = EINTR;  but
that is no longer possible if there are [ERESTART] errors that should
still restart.

Secondly, fcntl(F_SETLKW) restarting after a stop may actually be
observable, contrary to what I wrote before. This is due to the fair
queuing. Suppose thread A has locked byte 1 a while ago and thread B is
trying to lock byte 1 and 2 right now. Then thread C will be able to
lock byte 2 iff thread B has not blocked yet. If thread C will not be
allowed to lock byte 2 and will block on it, the TDF_SRESTART change
will cause it to be awakened if thread B is stopped. When thread B
resumes, the region to be locked will be recomputed. This scenario
unambiguously violates the POSIX requirement but I don't know how bad it
is.

Note that all these threads must be in separate processes because of
fcntl locks' strange semantics.

-- 
Jilles Tjoelker