Re: SCHED_ULE should not be the default

From: Jilles Tjoelker <jilles_at_stack.nl> Date: Wed, 14 Dec 2011 00:04:42 +0100 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:21 UTC

On Tue, Dec 13, 2011 at 10:40:48AM +0200, Ivan Klymenko wrote:
> If the algorithm ULE does not contain problems - it means the problem
> has Core2Duo, or in a piece of code that uses the ULE scheduler.
> I already wrote in a mailing list that specifically in my case (Core2Duo)
> partially helps the following patch:
> --- sched_ule.c.orig	2011-11-24 18:11:48.000000000 +0200
> +++ sched_ule.c	2011-12-10 22:47:08.000000000 +0200
> _at__at_ -794,7 +794,8 _at__at_
>  	 * 1.5 * balance_interval.
>  	 */
>  	balance_ticks = max(balance_interval / 2, 1);
> -	balance_ticks += random() % balance_interval;
> +//	balance_ticks += random() % balance_interval;
> +	balance_ticks += ((int)random()) % balance_interval;
>  	if (smp_started == 0 || rebalance == 0)
>  		return;
>  	tdq = TDQ_SELF();

This avoids a 64-bit division on 64-bit platforms but seems to have no
effect otherwise. Because this function is not called very often, the
change seems unlikely to help.

> _at__at_ -2118,13 +2119,21 _at__at_
>  	struct td_sched *ts;
>  
>  	THREAD_LOCK_ASSERT(td, MA_OWNED);
> +	if (td->td_pri_class & PRI_FIFO_BIT)
> +		return;
> +	ts = td->td_sched;
> +	/*
> +	 * We used up one time slice.
> +	 */
> +	if (--ts->ts_slice > 0)
> +		return;

This skips most of the periodic functionality (long term load balancer,
saving switch count (?), insert index (?), interactivity score update
for long running thread) if the thread is not going to be rescheduled
right now.

It looks wrong but it is a data point if it helps your workload.

>  	tdq = TDQ_SELF();
>  #ifdef SMP
>  	/*
>  	 * We run the long term load balancer infrequently on the first cpu.
>  	 */
> -	if (balance_tdq == tdq) {
> -		if (balance_ticks && --balance_ticks == 0)
> +	if (balance_ticks && --balance_ticks == 0) {
> +		if (balance_tdq == tdq)
>  			sched_balance();
>  	}
>  #endif

The main effect of this appears to be to disable the long term load
balancer completely after some time. At some point, a CPU other than the
first CPU (which uses balance_tdq) will set balance_ticks = 0, and
sched_balance() will never be called again.

It also introduces a hypothetical race condition because the access to
balance_ticks is no longer restricted to one CPU under a spinlock.

If the long term load balancer may be causing trouble, try setting
kern.sched.balance_interval to a higher value with unpatched code.

> _at__at_ -2144,9 +2153,6 _at__at_
>  		if (TAILQ_EMPTY(&tdq->tdq_timeshare.rq_queues[tdq->tdq_ridx]))
>  			tdq->tdq_ridx = tdq->tdq_idx;
>  	}
> -	ts = td->td_sched;
> -	if (td->td_pri_class & PRI_FIFO_BIT)
> -		return;
>  	if (PRI_BASE(td->td_pri_class) == PRI_TIMESHARE) {
>  		/*
>  		 * We used a tick; charge it to the thread so
> _at__at_ -2157,11 +2163,6 _at__at_
>  		sched_priority(td);
>  	}
>  	/*
> -	 * We used up one time slice.
> -	 */
> -	if (--ts->ts_slice > 0)
> -		return;
> -	/*
>  	 * We're out of time, force a requeue at userret().
>  	 */
>  	ts->ts_slice = sched_slice;

> and refusal to use options FULL_PREEMPTION
> But no one has unsubscribed to my letter, my patch helps or not in the
> case of Core2Duo...
> There is a suspicion that the problems stem from the sections of code
> associated with the SMP...
> Maybe I'm in something wrong, but I want to help in solving this
> problem ...

-- 
Jilles Tjoelker