On Sat, Jun 04, 2016 at 12:32:36PM +0300, Konstantin Belousov wrote: > On Fri, Jun 03, 2016 at 07:23:47PM -0700, Mark Johnston wrote: > > Hi, > > > > I've recently observed a hang in a multi-threaded process that had hit > > an assertion failure and was attempting to dump core. One thread was > > sleeping interruptibly on an advisory lock with TDF_SBDRY set (our > > filesystem sets VFCF_SBDRY). SIGABRT caused the receipient thread to > > suspend other threads with thread_single(SINGLE_NO_EXIT), which fails > > to interrupt the sleeping thread, resulting in the hang. > > > > My question is, why does the SA_CORE handler not force all threads to > > the user boundary before attempting to dump core? It must do so later > > anyway in order to exit. As I understand it, TDF_SBDRY is intended to > > avoid deadlocks that can occur when stopping a process, but in this > > case we don't stop the process with the intention of resuming it, so it > > seems erroneous to apply this flag. > > Does your fs both set TDF_SBDRY and call lf_advlock()/lf_advlockasync() ? It doesn't. This code belongs to a general framework for distributed FS locks; in this particular case, the application was using it to acquire a custom advisory lock. > This cannot work, regardless of the mode of single-threading. TDF_SBDRY > makes such sleep non-interruptible by any single-threading request, on > the promise that the thread owns some resources (typically vnode locks). > I.e. changing the mode would not help. I'm a bit confused by this. How does TDF_SBDRY prevent thread_single() from waking up the thread? The sleepq_abort() call is only elided in the SINGLE_ALLPROC case, so in other cases, I think we will still interrupt the sleep. Thus, since thread_suspend_check() is only invoked prior to going to sleep, the application I referred to must have attempted to single-thread the process before the thread in question went to sleep. > > I see two reasons to use SINGLE_NO_EXIT for coredumping. It allows > coredump writer to record more exact state of the process into the notes. > > Another one is that SINGLE_NO_EXIT is generally faster and more > reliable than SINGLE_BOUNDARY. Some states are already good enough for > SINGLE_NO_EXIT, while require more work to get into SINGLE_BOUNDARY. In > other words, core dump write starts earlier. > > It might be not very significant reasons. > > From what I see in the code, our NFS client has similar issue of calling > lf_advlock() with TDF_SBDRY set. Below is the patch to fix that. > Similar bug existed in our fifofs, see r277321. Thanks. It may be that a similar fix is appropriate in our locking code, but I'll have to spend more time reading it.Received on Mon Jun 06 2016 - 15:09:23 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:05 UTC