Re: Deadlocks with recent SMP current

From: Scott Long <scottl_at_samsco.org>
Date: Fri, 13 Aug 2004 14:13:19 -0600
Doug White wrote:
> On Fri, 13 Aug 2004, Martin Blapp wrote:
> 
> 
>>Since yesterday I'm getting complete deadlocks. This time unrelated
>>the servers are nor loaded at all, the just freeze after a while.
>>No break into DDB possible at all.
> 
> 
> Welcome to the club; I've been having them on my -curent builder since Aug
> 4. I'm going to set up a duplicate box and start binary-searching for the
> offending commit(s).
> 
> Preemption is the default, disabled.
> 
> My box is a dual-600MHz P3 with 1GB RAM and running kde. A make -j3
> buildworld will lock it up 75% of the time. It'll survive a nonparallel
> build, and it'll survive a kernel build.
> 
> Haven't tried WITNESS+INVARIANTS yet since it really dogs the machine. :)
> 

Can you try the patch below?  It's really only a band-aid, but might 
make things usable for now.  Also, are more lockups being seen under
ULE or under 4BSD.  There was a recent change to ULE (rev 1.120 of
sched_ule.c) that seems to have aggrivated the scheduler problems on
my test systems.

Scott


Index: kern_switch.c
===================================================================
RCS file: /usr/ncvs/src/sys/kern/kern_switch.c,v
retrieving revision 1.78
diff -u -r1.78 kern_switch.c
--- kern_switch.c       10 Aug 2004 00:26:25 -0000      1.78
+++ kern_switch.c       13 Aug 2004 20:11:27 -0000
_at__at_ -345,6 +345,8 _at__at_
                 return;
         }

+       critical_enter();
+
         tda = kg->kg_last_assigned;
         if ((ke = td->td_kse) == NULL) {
                 if (kg->kg_idle_kses) {
_at__at_ -441,6 +443,7 _at__at_
                 CTR3(KTR_RUNQ, "setrunqueue: held: td%p kg%p pid%d",
                         td, td->td_ksegrp, td->td_proc->p_pid);
         }
+       critical_exit();
  }

  /*
Received on Fri Aug 13 2004 - 18:15:46 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:06 UTC