Re: Fwd: propgagate_priority() crashes: recursive msleep() ??

From: Peter Edwards <peter.edwards_at_openet-telecom.com> Date: Fri, 14 Nov 2003 18:23:23 +0000 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:29 UTC

John Baldwin wrote:

>On 14-Nov-2003 Peter Edwards wrote:
>  
>
>>(Aplogies if this message is a duplicate: The original is AWOL for quite 
>>a while now)
>>
>>Hi,
>>I'm getting a crash in propagate priority, as mentioned by a few people recently. Bug reports and
>>comments about it seemed to have dropped off, so given that I can reliably reproduce it, I was
>>trying to work out why it's going on.
>>
>>One thing I found quite odd was the following stack trace. It appears that msleep() is being
>>called recursively via cursig() calling stopevent. When msleep calls cursig(), it has temporarily
>>dropped Giant. Surely this is bogus? (This is from a a kernel updated in the last few hours)
>>
>>#0  sched_switch (td=0xc4b30780) at /scratch/src/sys/kern/sched_4bsd.c:606
>>#1  0xc050d8db in mi_switch () at /scratch/src/sys/kern/kern_synch.c:514
>>#2  0xc050cf7f in msleep (ident=0xc4dc2bc8, mtx=0xc4dc2b04, priority=92, wmesg=0x0, 
>>    timo=0) at /scratch/src/sys/kern/kern_synch.c:255
>>#3  0xc0534255 in stopevent (p=0xc4dc2a98, event=2, val=2)
>>    at /scratch/src/sys/kern/sys_process.c:740
>>#4  0xc0509362 in issignal (td=0xc4b30780) at /scratch/src/sys/kern/kern_sig.c:2082
>>#5  0xc0504eb8 in cursig (td=0xc4b30780) at /scratch/src/sys/sys/signalvar.h:227
>>#6  0xc050d0f2 in msleep (ident=0xc4dc2a98, mtx=0xc4dc2b04, priority=348, wmesg=0x0, 
>>    timo=0) at /scratch/src/sys/kern/kern_synch.c:294
>>#7  0xc04eb82f in wait1 (td=0xc4b30780, uap=0xddcd6d10, compat=0)
>>    at /scratch/src/sys/kern/kern_exit.c:766
>>#8  0xc04eab90 in wait4 (td=0x0, uap=0x0) at /scratch/src/sys/kern/kern_exit.c:548
>>#9  0xc06241d0 in syscall (frame=
>>      {tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 134899628, tf_esi = 134912305, tf_ebp =
>>-1077943784, tf_isp = -573739660, tf_ebx = 772, tf_edx = 135012352, tf_ecx = 13, tf_eax = 7,
>>tf_trapno = 12, tf_err = 2, tf_eip = 134525375, tf_cs = 31, tf_eflags = 646, tf_esp =
>>-1077943812, tf_ss = 47}) at /scratch/src/sys/i386/i386/trap.c:1010
>>    
>>
>
>Are you using gdb or something else that does ptrace?  Jeff has pointed
>out why pp panics here, because this thread owns the sigacts lock while
>asleep.  However, doing a double sleep like this is very bogus and bad.
>Grrrr.
>
>  
>
I was using "truss": the actual command I ran was

# truss mount unreachablehost:/mnt /mnt

(where "unreachablehost" was the IP address of a host I had no route to)

IIRC, the panicing thread was in softclock (possibly handling the 
terminal ^C, not sure), the mount command was waiting on the mount_nfs 
child to finish, and I assume the mount_nfs child was waiting in vain 
for a response it was never going to get.
But, I suppose any traced process arriving in msleep (or cursig) is 
problematic.

Silly question: Could the STOPEVENT stuff in issignal() just be delayed 
until userret()? I thought that was done for some other similar 
circumstances.