Re: Is kern.sched.preempt_thresh=0 a sensible default?

From: Don Lewis <truckman_at_FreeBSD.org> Date: Sat, 9 Jun 2018 18:07:15 -0700 (PDT) · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:16 UTC

On  9 Jun, Stefan Esser wrote:

> 3) Programs that evenly split the load on all available cores have been
>    suffering from sub-optimal assignment of threads to cores. E.g. on a
>    CPU with 8 (virtual) cores, this resulted in 6 cores running the load
>    in nominal time, 1 core taking twice as long because 2 threads were
>    scheduled to run on it, while 1 core was mostly idle. Even if the
>    load was initially evenly distributed, a woken up process that ran on
>    one core destroyed the symmetry and it was not recovered. (This was a
>    problem e.g. for parallel programs using MPI or the like.)

When a core is about to go idle or first enters the idle state it will
search for the most heavily loaded core and steal a thread from it.  The
core will only go to sleep if it can't find a non-running thread to
steal.

If there are N cores and N+1 runnable threads, there is a long term load
balancer than runs periodically.  It searches for the most and least
loaded cores and moves a thread from the former to the latter.  That
prevents the same pair of threads from having to share the same core
indefinitely.

There is an observed bug where a low priority thread can get pinned to a
particular core that is already occupied by a high-priority CPU-bound
thread that never releases the CPU.  The low priority thread can't
migrate to another core that subsequently becomes available because it
it is pinned.  It is not known how the thread originally got into this
state.  I don't see any reason for 4BSD to be immune to this problem.

> 4) The real time behavior of SCHED_ULE is weak due to interactive
>    processes (e.g. the X server) being put into the "time-share" class
>    and then suffering from the problems described as 1) or 2) above.
>    (You distinguish time-share and batch processes, which both are
>     allowed to consume their full quanta even of a higher priority
>     process in their class becomes runnable. I think this will not
>     give the required responsiveness e.g. for an X server.)
>    They should be considered I/O intensive, if they often don't use
>    their full quantum, without taking the significant amount of CPU
>    time they may use at times into account. (I.e. the criterion for
>    time-sharing should not be the CPU time consumed, but rather some
>    fraction of the quanta not being fully used due to voluntarily giving
>    up the CPU.) With many real-time threads it may be hard to identify
>    interactive threads, since they are non-voluntarily disrupted too
>    often - this must be considered in the sampling of voluntary vs.
>    non-voluntary context switches.

It can actually be worse than this.  There is a bug that can cause the
wnck-applet component of the MATE desktop to consume a large amount of
CPU time, and apparently it is communicating with the Xorg server, which it 
drives to 100% CPU.  That makes it's PRI value increase greatly so
it has a lower scheduling priority.  Even without competing CPU load,
interactive performance is hurt.  With competing CPU load it gets much
worse.