Re: thread suspension when dumping core

From: Konstantin Belousov <kostikbel_at_gmail.com>
Date: Tue, 7 Jun 2016 07:29:56 +0300
On Mon, Jun 06, 2016 at 09:17:41PM -0700, Mark Johnston wrote:
> Sure, see below. For reference:
> 
> td_flags = 0xa84c = INMEM | SINTR | CANSWAP | ASTPENDING | SBDRY | NEEDSUSPCHK
> td_pflags = 0
> td_inhibitors = 0x2 = SLEEPING
> td_locks = 0
> 
> stack:
> mi_switch+0x21e sleepq_catch_signals+0x377 sleepq_wait_sig+0xb _sleep+0x29d ...
> 
> p_flag = 0x10080080 = INMEM | STOPPED_SINGLE | HADTHREADS
> p_flag2 = 0
> 
> The thread is sleeping interruptibly. The NEEDSUSPCHK flag is set, yet the
> SLEEPABORT flag is not, so the thread can not have been sleeping when
> thread_single() was called - else sleepq_abort() would have been
> invoked and set SLEEPABORT. We are at the second sleepq_switch() call in
> sleepq_catch_signals(), and no signal was pending, so we called
> thread_suspend_check(), which returned 0 because of SBDRY. So we went to
> sleep.
> 
> I note that this couldn't have happened prior to r283320. That change
> was apparently motivated by a similar hang, but in that case the thread
> was suspended (with a vnode lock held) rather than asleep. It looks like
> our internal fix also added a change to set TDF_SBDRY around
> filesystem-specific syscalls, which often sleep interruptibly while
> holding vnode locks. But I don't think that's the problem here, as you
> noted with lf_advlock().
> 
> With r283320 reverted, P_STOPPED_SIG would not have been set, so
> thread_suspend_check() would have suspended us, allowing the core dump
> to proceed. I had thought that using SINGLE_BOUNDRY beforing coredumping
> would fix both hangs, but I guess that wouldn't help SINGLE_ALLPROC, so
> this is probably the wrong place to be solving the problem.

This looks as if we should not ignore suspension requests in
thread_suspend_check() completely in TDF_SBDRY case, but return either
EINTR or ERESTART (most likely ERESTART). Note that the goal of
TDF_SBDRY is to avoid suspending in the protected region, not to make an
impression that the suspension does not occur at all.
Received on Tue Jun 07 2016 - 02:30:02 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:05 UTC