On 7 Jan, Robert Watson wrote: > > Got a bug report this evening that the strace package hangs on 5-CURRENT. > I'm able to confirm this; for those that don't know, strace makes > extensive use of procfs. On attempting to reproduce it, I first got: > > crash2# strace ls > Sleeping on "stopevent" with the following non-sleepable locks held: > exclusive sleep mutex sigacts r = 0 (0xc20e2aa8) locked _at_ > kern/subr_trap.c:260 > lock order reversal > 1st 0xc20e2aa8 sigacts (sigacts) _at_ kern/subr_trap.c:260 > 2nd 0xc20f1224 process lock (process lock) _at_ kern/kern_synch.c:309 > Stack backtrace: > backtrace(c0864c4a,c20f1224,c0860e7b,c0860e7b,c0861ee5) at backtrace+0x17 > witness_lock(c20f1224,8,c0861ee5,135,c20f1224) at witness_lock+0x672 > _mtx_lock_flags(c20f1224,0,c0861edc,135,ffffffff) at _mtx_lock_flags+0xba > msleep(c20f12e8,c20f1224,5c,c0865441,0) at msleep+0x794 > stopevent(c20f11b8,2,13,823,c0922200) at stopevent+0x85 > issignal(c1f31bd0,2,c08619f7,bd,1) at issignal+0x168 > cursig(c1f31bd0,0,c0864399,104,0) at cursig+0xe8 > ast(c9520d48) at ast+0x4b0 > doreti_ast() at doreti_ast+0x17 > load: 0.21 cmd: strace 583 [iowait] 0.00u 0.91s 0% 724k > [sent a serial break] > > Cool, eh? Second try: > > crash2# strace ls > execve(0xbfbfe890, [0xbfbfed54], [/* 0 vars */]PIOCWSTOP: Input/output > error > > Even better. > > The first obvious observation is that holding mutexes other than the > process mutex over calls to _STOPEVENT() is a bad idea. It seems like the > p_sig mutex is used to cover a fair amount of flag handling, signal entry > changes, etc, etc. I'm not familiar with the semantic requirements here, > but presumably something needs to change. Is it possible to release the > locks after grabbing the value of 'action' (or even do a lock-free read), > and then grab the sigact lock only later during actual delivery, yet > maintain the right semantics? In both issignal() and postsig() I think it would be safe to drop the p_sig mutex before _STOPEVENT() and grab the mutex again afterwards. About the only thing that can happen during the interim would be the receipt of another signal and I don't think that would be a problem. Dropping the mutex is how issignal() handles ptracestop() a bit further down in the code.Received on Thu Jan 08 2004 - 03:11:03 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:37 UTC