Re: tcsh backtick hang info

From: Mark Peek <mp_at_FreeBSD.org>
Date: Wed, 11 Jul 2007 20:56:26 -0700
On 7/11/07 7:28 PM, Doug White wrote:
> (note: freebsd-current_at_freebsd.org and tcsh-bugs_at_mx.gw.com are in the 
> To: on this message. Restrict replies accordingly.)
> 
> Hey folks,
> 
> I spent several hours today pawing through the tcsh source in an effort 
> to figure out whats going on with tcsh hangs with backticked commands in 
> tcsh 6.15.00.
> 
> The canonical example is something like:
> 
> kill `ps ax | grep foo | awk '{print $1}'`
> 
> where a builtin gets its arguments from a backticked expression composed 
> of non-builtins.
> 
> tcsh 6.15.00 introduced a new reference-counted signal management 
> facility where, instead of manipulating the signal mask directly, 
> functions increment a variable that is polled to see whether to perform 
> the action associated with SIGINT, SIGCHLD, SIGALRM, or SIGHUP.  The 
> signal handler function itself sets a pending flag for each named signal 
> and returns, so only a few instructions are executed in signal context.  
> At some future point the pending flags are polled by a call to 
> handle_pending_signals(), usually in a loop where the shell goes to 
> sleep waiting for an external action to occur. When the function no 
> longer needs the signal to be blocked it decrements the count via a 
> stack of cleanup handlers. When a count reaches zero then a poll is 
> immediately triggered.
> 
> If the disabled count is >1 for a signal when a handle_pending_signals() 
> poll occurs, then the signal is not "handled".
> 
> In the case above, the disabled count for SIGCHLD is 1 when SIGCHLD 
> fires from the completion of the backticked commands. The sigsuspend() 
> in pjwait() is correctly woken up by the kernel but, because the 
> disabled count is 1, the shell goes back into sigsuspend() and appears 
> to hang.
> 
> In this case it appears to be an improperly placed bump to the SIGCHLD 
> disable count that is held over a call to pjwait(). I haven't yet 
> determined the call stack (and gdb cannot debug tcsh at the moment) so I 
> need to continue instrumenting the code to figure out what higher level 
> function is disabling SIGCHLD and then calling something that eventually 
> calls pjwait().
> 

There appears to be two different issues. One is with the builtin kill and the 
other is the gdb issue. I sent off a tentative patch to the reporter of the 
builtin kill issue and am awaiting onfirmation. The patch is here:

http://people.freebsd.org/~mp/tcsh_kill.patch

The gdb issue, much to my dismay, is still alluding my debugging skill given 
the interaction with gdb and issues with actually debugging what is happening.

Mark
Received on Thu Jul 12 2007 - 02:23:30 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:14 UTC