Re: SCHED_ULE should not be the default

From: Ivan Klymenko <fidaj_at_ukr.net> Date: Wed, 14 Dec 2011 01:39:06 +0200 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:21 UTC

В Wed, 14 Dec 2011 00:04:42 +0100
Jilles Tjoelker <jilles_at_stack.nl> пишет:

> On Tue, Dec 13, 2011 at 10:40:48AM +0200, Ivan Klymenko wrote:
> > If the algorithm ULE does not contain problems - it means the
> > problem has Core2Duo, or in a piece of code that uses the ULE
> > scheduler. I already wrote in a mailing list that specifically in
> > my case (Core2Duo) partially helps the following patch:
> > --- sched_ule.c.orig	2011-11-24 18:11:48.000000000 +0200
> > +++ sched_ule.c	2011-12-10 22:47:08.000000000 +0200
> > _at__at_ -794,7 +794,8 _at__at_
> >  	 * 1.5 * balance_interval.
> >  	 */
> >  	balance_ticks = max(balance_interval / 2, 1);
> > -	balance_ticks += random() % balance_interval;
> > +//	balance_ticks += random() % balance_interval;
> > +	balance_ticks += ((int)random()) % balance_interval;
> >  	if (smp_started == 0 || rebalance == 0)
> >  		return;
> >  	tdq = TDQ_SELF();
> 
> This avoids a 64-bit division on 64-bit platforms but seems to have no
> effect otherwise. Because this function is not called very often, the
> change seems unlikely to help.

Yes, this section does not apply to this problem :)
Just I posted the latest patch which i using now...

> 
> > _at__at_ -2118,13 +2119,21 _at__at_
> >  	struct td_sched *ts;
> >  
> >  	THREAD_LOCK_ASSERT(td, MA_OWNED);
> > +	if (td->td_pri_class & PRI_FIFO_BIT)
> > +		return;
> > +	ts = td->td_sched;
> > +	/*
> > +	 * We used up one time slice.
> > +	 */
> > +	if (--ts->ts_slice > 0)
> > +		return;
> 
> This skips most of the periodic functionality (long term load
> balancer, saving switch count (?), insert index (?), interactivity
> score update for long running thread) if the thread is not going to
> be rescheduled right now.
> 
> It looks wrong but it is a data point if it helps your workload.

Yes, I did it for as long as possible to delay the execution of the code in section:
...
#ifdef SMP
        /*
         * We run the long term load balancer infrequently on the first cpu.
         */
        if (balance_tdq == tdq) {
                if (balance_ticks && --balance_ticks == 0)
                        sched_balance();
        }
#endif
...

> 
> >  	tdq = TDQ_SELF();
> >  #ifdef SMP
> >  	/*
> >  	 * We run the long term load balancer infrequently on the
> > first cpu. */
> > -	if (balance_tdq == tdq) {
> > -		if (balance_ticks && --balance_ticks == 0)
> > +	if (balance_ticks && --balance_ticks == 0) {
> > +		if (balance_tdq == tdq)
> >  			sched_balance();
> >  	}
> >  #endif
> 
> The main effect of this appears to be to disable the long term load
> balancer completely after some time. At some point, a CPU other than
> the first CPU (which uses balance_tdq) will set balance_ticks = 0, and
> sched_balance() will never be called again.
> 

That is, for the same reason as above in the text...

> It also introduces a hypothetical race condition because the access to
> balance_ticks is no longer restricted to one CPU under a spinlock.
> 
> If the long term load balancer may be causing trouble, try setting
> kern.sched.balance_interval to a higher value with unpatched code.

I checked it in the first place - but it did not help fix the situation...

The impression of malfunction rebalancing...
It seems that the thread is passed on to the same core that is loaded and so...
Perhaps this is a consequence of an incorrect definition of the topology CPU?

> 
> > _at__at_ -2144,9 +2153,6 _at__at_
> >  		if
> > (TAILQ_EMPTY(&tdq->tdq_timeshare.rq_queues[tdq->tdq_ridx]))
> > tdq->tdq_ridx = tdq->tdq_idx; }
> > -	ts = td->td_sched;
> > -	if (td->td_pri_class & PRI_FIFO_BIT)
> > -		return;
> >  	if (PRI_BASE(td->td_pri_class) == PRI_TIMESHARE) {
> >  		/*
> >  		 * We used a tick; charge it to the thread so
> > _at__at_ -2157,11 +2163,6 _at__at_
> >  		sched_priority(td);
> >  	}
> >  	/*
> > -	 * We used up one time slice.
> > -	 */
> > -	if (--ts->ts_slice > 0)
> > -		return;
> > -	/*
> >  	 * We're out of time, force a requeue at userret().
> >  	 */
> >  	ts->ts_slice = sched_slice;
> 
> > and refusal to use options FULL_PREEMPTION
> > But no one has unsubscribed to my letter, my patch helps or not in
> > the case of Core2Duo...
> > There is a suspicion that the problems stem from the sections of
> > code associated with the SMP...
> > Maybe I'm in something wrong, but I want to help in solving this
> > problem ...
>