On Thu, Oct 01, 2009 at 03:07:30PM +0300, Kostik Belousov wrote: > On Wed, Sep 30, 2009 at 11:02:19AM -0700, Justin Teller wrote: > > We're trying to control one process from another process through > > signals (I know, revolutionary ;-), and we've found that a signal > > occasionally gets lost. The process we're signaling is > > multi-threaded. It looks like the signal is lost when the kernel > > decides to post the signal to a thread that is in the process of dying > > (calling pthread_exit, etc). > > Is this expected behavior that we should just handle, or is it a race > > in the kernel that should be/will be/already is fixed? > > It may be that a fix is already in current, and I just haven't found > > it in my searches through the source code (I'm working off of source > > code for an older 8.0 image). If it is fixed, I'd appreciate a > > pointer to the code that fixes it. > When thread enters the kernel last time to be killed, it is very much > bad idea to allow it to return to usermode to handle directed signal. > And, there would always be window between entering the kernel and > marking the thread as exiting. > Moving the thread-directed signals back to the process queue is hard > and there is no way to distinguish which signals were sent to process > or to the thread. > Possibly, we could clear the thread signal mask while still in user mode. > I think it would still leave a very narrow window when a process > signal could be directed to the dying thread and not be delivered to > usermode; this happens when signal is generated while sigsetmask already > entered the kernel, but did not changed the mask yet. This is worked > around by rechecking the pending signals after setting the block mask > and releasing it if needed. SIGKILL cannot be masked. Is it possible that a kill(SIGKILL) is assigned to a dying thread and lost? (SIGSTOP cannot be masked either, but its processing is done by the sending thread, so it should be safe.) I suppose that race can also occur in other uses of pthread_sigmask(). If a thread masks a signal for a while, and that signal is assigned to that thread just as it is executing pthread_sigmask(), it will only be processed when that thread unblocks or accepts it, even though other threads may have the signal unmasked or be in a sigwait() for it. Signals sent after pthread_sigmask() has changed the signal mask are likely processed sooner because they will be assigned to a different thread or left in the process queue. POSIX seems to say that signals generated for the process should remain queued for the process and should only be assigned to a thread at time of delivery. This could be implemented by leaving signals in the process queue or by tracking for each signal in the thread queue whether it was directed at the thread and moving the process signals back at sigmask/thr_exit. Either way I am not sure of all the consequences at this time. By the way, SA_PROC in kern_sig.c is bogus, because whether a signal is directed at a specific thread depends on how it was generated and not on the signal number. Fortunately, it is not used. -- Jilles TjoelkerReceived on Fri Oct 02 2009 - 18:12:14 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:56 UTC