(unknown charset) Re: SCHED_ULE should not be the default

From: (unknown charset) Bruce Evans <brde_at_optusnet.com.au> Date: Wed, 14 Dec 2011 12:25:14 +1100 (EST) · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:22 UTC

On Wed, 14 Dec 2011, Ivan Klymenko wrote:

> В Wed, 14 Dec 2011 00:04:42 +0100
> Jilles Tjoelker <jilles_at_stack.nl> пишет:
>
>> On Tue, Dec 13, 2011 at 10:40:48AM +0200, Ivan Klymenko wrote:
>>> If the algorithm ULE does not contain problems - it means the
>>> problem has Core2Duo, or in a piece of code that uses the ULE
>>> scheduler. I already wrote in a mailing list that specifically in
>>> my case (Core2Duo) partially helps the following patch:
>>> --- sched_ule.c.orig	2011-11-24 18:11:48.000000000 +0200
>>> +++ sched_ule.c	2011-12-10 22:47:08.000000000 +0200
>>> ...
>>> _at__at_ -2118,13 +2119,21 _at__at_
>>>  	struct td_sched *ts;
>>>
>>>  	THREAD_LOCK_ASSERT(td, MA_OWNED);
>>> +	if (td->td_pri_class & PRI_FIFO_BIT)
>>> +		return;
>>> +	ts = td->td_sched;
>>> +	/*
>>> +	 * We used up one time slice.
>>> +	 */
>>> +	if (--ts->ts_slice > 0)
>>> +		return;
>>
>> This skips most of the periodic functionality (long term load
>> balancer, saving switch count (?), insert index (?), interactivity
>> score update for long running thread) if the thread is not going to
>> be rescheduled right now.
>>
>> It looks wrong but it is a data point if it helps your workload.
>
> Yes, I did it for as long as possible to delay the execution of the code in section:

I don't understand what you are doing here, but recently noticed that
the timeslicing in SCHED_4BSD is completely broken.  This bug may be a
feature.  SCHED_4BSD doesn't have its own timeslice counter like ts_slice
above.  It uses `switchticks' instead.  But switchticks hasn't been usable
for this purpose since long before SCHED_4BSD started using it for this
purpose.  switchticks is reset on every context switch, so it is useless
for almost all purposes -- any interrupt activity on a non-fast interrupt
clobbers it.

Removing the check of ts_slice in the above and always returning might
give a similar bug to the SCHED_4BSD one.

I noticed this while looking for bugs in realtime scheduling.  In the
above, returning early for PRI_FIFO_BIT also skips most of the periodic
functionality.  In SCHED_4BSD, returning early is the usual case, so
the PRI_FIFO_BIT might as well not be checked, and it is the unusual
fifo scheduling case (which is supposed to only apply to realtime
priority threads) which has a chance of working as intended, while the
usual roundrobin case degenerates to an impure form of fifo scheduling
(iit is impure since priority decay still works so it is only fifo
among threads of the same priority).

>>...
>>> _at__at_ -2144,9 +2153,6 _at__at_
>>>  		if
>>> (TAILQ_EMPTY(&tdq->tdq_timeshare.rq_queues[tdq->tdq_ridx]))
>>> tdq->tdq_ridx = tdq->tdq_idx; }
>>> -	ts = td->td_sched;
>>> -	if (td->td_pri_class & PRI_FIFO_BIT)
>>> -		return;
>>>  	if (PRI_BASE(td->td_pri_class) == PRI_TIMESHARE) {
>>>  		/*
>>>  		 * We used a tick; charge it to the thread so
>>> _at__at_ -2157,11 +2163,6 _at__at_
>>>  		sched_priority(td);
>>>  	}
>>>  	/*
>>> -	 * We used up one time slice.
>>> -	 */
>>> -	if (--ts->ts_slice > 0)
>>> -		return;
>>> -	/*
>>>  	 * We're out of time, force a requeue at userret().
>>>  	 */
>>>  	ts->ts_slice = sched_slice;

With the ts_slice check here before you moved it, removing it might
give buggy behaviour closer to SCHED_4BSD.

>>> and refusal to use options FULL_PREEMPTION

4-5 years ago, I found that any form of PREMPTION was a pessimization
for at least makeworld (since it caused too many context switches).
PREEMPTION was needed for the !SMP case, at least partly because of
the broken switchticks (switchticks, when it works, gives voluntary
yielding by some CPU hogs in the kernel.  PREEMPTION, if it works,
should do this better).  So I used PREEMPTION in the !SMP case and
not for the SMP case.  I didn't worry about the CPU hogs in the SMP
case since it is rare to have more than 1 of them and 1 will use at
most 1/2 of a multi-CPU system.

>>> But no one has unsubscribed to my letter, my patch helps or not in
>>> the case of Core2Duo...
>>> There is a suspicion that the problems stem from the sections of
>>> code associated with the SMP...
>>> Maybe I'm in something wrong, but I want to help in solving this
>>> problem ...

The main point of SCHED_ULE is to give better affinity for multi-CPU
systems.  But the `multi' apparently needs to be strictly more than
2 for it to brak even.

Bruce