Re: Deadlocks with recent SMP current

From: Jon Noack <noackjr_at_alumni.rice.edu>
Date: Sat, 14 Aug 2004 17:35:05 -0500
On 08/13/04 15:13, Scott Long wrote:
> Doug White wrote:
>> On Fri, 13 Aug 2004, Martin Blapp wrote:
>>> Since yesterday I'm getting complete deadlocks. This time
>>> unrelated the servers are nor loaded at all, the just freeze
>>> after a while. No break into DDB possible at all.
>> 
>> Welcome to the club; I've been having them on my -curent builder 
>> since Aug 4. I'm going to set up a duplicate box and start 
>> binary-searching for the offending commit(s).
>> 
>> Preemption is the default, disabled.
>> 
> > My box is a dual-600MHz P3 with 1GB RAM and running kde. A make -j3
>> buildworld will lock it up 75% of the time. It'll survive a 
>> nonparallel build, and it'll survive a kernel build.
>> 
>> Haven't tried WITNESS+INVARIANTS yet since it really dogs the
>> machine. :)
> 
> Can you try the patch below? It's really only a band-aid, but might 
> make things usable for now. Also, are more lockups being seen under 
> ULE or under 4BSD. There was a recent change to ULE (rev 1.120 of 
> sched_ule.c) that seems to have aggrivated the scheduler problems on 
> my test systems.
> 
> Scott
> 
> Index: kern_switch.c
> ===================================================================
> RCS file: /usr/ncvs/src/sys/kern/kern_switch.c,v
> retrieving revision 1.78
> diff -u -r1.78 kern_switch.c
> --- kern_switch.c       10 Aug 2004 00:26:25 -0000      1.78
> +++ kern_switch.c       13 Aug 2004 20:11:27 -0000
> _at__at_ -345,6 +345,8 _at__at_
>                 return;
>         }
> 
> +       critical_enter();
> +
>         tda = kg->kg_last_assigned;
>         if ((ke = td->td_kse) == NULL) {
>                 if (kg->kg_idle_kses) {
> _at__at_ -441,6 +443,7 _at__at_
>                 CTR3(KTR_RUNQ, "setrunqueue: held: td%p kg%p pid%d",
>                         td, td->td_ksegrp, td->td_proc->p_pid);
>         }
> +       critical_exit();
>  }
> 
>  /*

Here's a data point:
My dual Pentium3 system has been up for 20+ hours with this patch. 
Previously, it wouldn't survive for more than an hour or so (regardless 
of load).

Jon
Received on Sat Aug 14 2004 - 20:35:17 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:06 UTC