Re: Timing issue with Dummynet on high kernel timer interrupt

From: Bruce Evans <brde_at_optusnet.com.au> Date: Sat, 7 Nov 2015 05:46:53 +1100 (EST) · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:00 UTC

On Fri, 6 Nov 2015, Ian Lepore wrote:

> On Fri, 2015-11-06 at 17:51 +0100, Hans Petter Selasky wrote:
>> On 11/06/15 17:43, Ian Lepore wrote:
>>> On Fri, 2015-11-06 at 17:28 +0100, Hans Petter Selasky wrote:
>>>> Hi,
>>
>>>
>>> Do the test II results change with this setting?
>>>
>>>    sysctl kern.timecounter.alloweddeviation=0
>>
>> Yes, it looks much better:
>>
>> debug.total: 10013 -> 0
>> debug.total: 10013 -> 0
>> ...
> This isn't the first time that the alloweddeviation feature has led
> people (including me in the past) to think there is a timing bug.  I
> think the main purpose of the feature is to help save battery power on
> laptops by clustering nearby scheduled wakeups to all happen at the
> same time and then allow for longer sleeps between each wakeup.

I was trying to remember the flag for turning off that "feature".  It
gives the bizarre behaviour that on an old system with a timer resolution
of 10 msec, "time sleep 1" sleeps for 1 second with an average error of
< 10 msec, but with a timer resolution of 1 msec for hardclock and finer
for short timeouts, "time sleep 1" sleeps for an average of an extra 30
msec (worst case 1.069 seconds IIRC).  Thus high resolution timers give
much lower resolution for medium-sized timeouts.  (For "sleep 10", the
average error is again 30 msec but this is relatively smaller, and for
"sleep .001" the average error must be less than 1 msec to work at all,
though it is likely to be relatively large.)

> I've been wondering lately whether this might also be behind the
> unexplained "load average is always 0.60" problem people have noticed
> on some systems.  If load average is calculated by sampling what work
> is happening when a timer interrupt fires, and the system is working
> hard to ensure that a timer interrupt only happens when there is actual
> work to do, you'd end up with statistics reporting that there is work
> being done most of the time when it took a sample.

I use HZ = 100 and haven't seen this.  Strangely, HZ = 100 gives the same
69 msec max error for "sleep 1" as HZ = 1000.

Schedulers should mostly use the actual thread runtimes to avoid
sampling biases.  That might even be faster.  But it doesn't work so
well for the load average, or at all for resource usages that are
averages, or for the usr/sys/intr splitting of the runtime.  It is
good enough for scheduling since the splitting is not need for
scheduling.

Bruce