Re: freebsd 13 ryzen micro stutter

From: Andriy Gapon <avg_at_FreeBSD.org>
Date: Thu, 25 Mar 2021 17:31:25 +0200
On 23/03/2021 16:54, Nils B. wrote:
> Hi,
> 
> On 23.03.21 10:34, myfreeweb wrote:
>> None of these should be an issue, but:
>>
>> sysctl kern.sched.steal_thresh=1
>>
>> For some reason with the default value of 2, I'm seeing weird stuttering in
>> youtube
>> videos, games, etc. on a 5950X system. 1 (or 0, IIRC) works fine.
> 
> yes, finally... Using a Ryzen 1700, Asrock AB350 Pro4 and Radeon RX460 and got that
> awful micro stuttering all the time; not only under FreeBSD 13.0-ALPHA3 now, but
> also
> under FreeBSD 12-STABLE in the past.
> 
> Occurences were during listening to music using MPV (one-second-*krk*-loops);
> watching
> YouTube videos (video hangs for a second but audio continues) and often simply
> during
> mouse movements where even MouseKeyPress- and MouseKeyRelease-events just didn't
> reach
> the system at all.
> 
> Setting
> 
>     kern.sched.steal_thresh=0
> 
> eliminates these micro stutterings in the whole system.
> 
> 
> I also would really, really like to know the reason why this parameter has such an
> impact...

It's been a long time since I looked at that corner of the code.
I think that in theory there should not be any difference between steal_thresh
of zero, one and two.  For a thread to be stolen there should be at least one
thread that's runnable, but not running.  That also should imply that there is a
a thread that's currently running.  So, values equal or less than two should
mean the same thing.

The only practical difference I can think of is a situation where a processor
has a runnable thread but does not "realize" it, so the processor stays idle
when it actually has work to do.
If such a thread is not stolen then it may take some time for the processor to
actually start running it.  If it's stolen then the thread may start executing
sooner on a different processor that was about to become idle.

That's just a hypothesis though.

If it's correct, then there can be a number of explanations.  From a problem
with inter-processor communication (e.g., related to mwait) to a slow wakeup of
a core from a deep idle state to a problem with interrupt delivery.

There are some tools in tools/sched/ directory.
schedgraph.py can be used for visual inspection of scheduling traces collected
using KTR.  The file has instructions on how to collect them.
Alternatively, schedgraph.d can be used to collect such traces.
If anyone affected can gather a short sample that captures the problem, then
there might be someone who would be willing to look at them.


-- 
Andriy Gapon
Received on Thu Mar 25 2021 - 14:31:30 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:27 UTC