Re: [RFC/RFT] calloutng

From: Alexander Motin <mav_at_FreeBSD.org>
Date: Mon, 31 Dec 2012 12:17:27 +0200
On 31.12.2012 08:17, Luigi Rizzo wrote:
> On Sun, Dec 30, 2012 at 04:13:43PM -0700, Ian Lepore wrote:
> ...
>> I grabbed testsleep.c to test an arm event timer implementation, and had
>> to fix a couple nits... kqueueto was missing from the names[] array, and
>> I had to add a "* 1000" to a couple places where usec was stuffed into a
>> timespec's tv_nsec.
>>
>> I also tested the calloutng_12_17 patches and the kqueue stuff behaved
>> very strangely.

I've rewritten kqueue timeouts at the calloutng_12_26.patch.

>> Then I noticed you had a 12_26 patchset so I tested
>> that (after crudely fixing a couple uninitialized var warnings), and it
>> all looks good on this arm (Raspberry Pi).  I'll attach the results.
>>
>> It's so sweet to be able to do precision sleeps.

Thank you for testing, Ian.

> interesting numbers, but there seems to be some problem in computing
> the exact interval; delays are much larger than expected.
>
> In this test, the original timer code used to round to the next multiple
> of 1 tick and then add another tick (except for the kqueue case),
> which is exactly what you see in the second set of measurements.
>
> The calloutng code however seems to do something odd:
> in addition to fixed overhead (some 50us, which you can see in
> the tests for 1us and 300us), all delay seem to be ~10% larger
> than what is requested, upper bounded to 10ms (note, the
> numbers are averages so i cannot tell whether all samples are
> the same or there is some distribution of values).
>
> I am not sure if this error is peculiar of the ARM version or also
> appears on x86/amd64 but I believe it should be fixed.
>
> If you look at the results below:
>
> 1us 	possily ok:
> 	for very short intervals i would expect some kind
> 	of 'reschedule' without actually firing a timer; maybe
> 	50us are what it takes to do a round through the scheduler ?
>
> 300us	probably ok
> 	i guess the extra 50-90us are what it takes to do a round
> 	through the scheduler
>
> 1000us	borderline (this is the case for poll and kqueue, which are
> 	rounded to 1ms)
> 	here intervals seem to be increased by 10%, and i cannot see
> 	a good reason for this (more below).
>
> 3000us and above: wrong
> 	here again, the intervals seem to be 10% larger than what is
> 	requested, perhaps limiting the error to 10-20ms.
>
>
> Maybe the 10% extension results from creating a default 'precision'
> for legacy calls, but i do not think this is done correctly.
>
> First of all, if users do not specify a precision themselves, the
> automatically generated value should never exceed one tick.
>
> Second, the only point of a 'precision' parameter is to merge
> requests that may be close in time, so if there is already a
> timer scheduled within [Treq, Treq+precision] i will get it;
> but if there no pending timer, then one should schedule it
> for the requested interval.
>
> Davide/Alexander, any ideas ?

All mentioned effects could be explained with implemented logic. 50us at 
1us is probably sum of minimal latency of the hardware eventtimer on the 
specific platform and some software processing overhead (syscall, 
callout, timecouters, scheduler, etc). At later points system starts to 
noticeably use precision specified by kern.timecounter.alloweddeviation 
sysctl. It affects results from two sides: 1) extending intervals for 
specified percent of time to allow event aggregation, and 2) choosing 
time base between fast getbinuptime() and precise binuptime(). Extending 
interval is needed to aggregate not only callouts with each other, but 
also callouts with other system events, which are impossible to schedule 
in advance. It gives specified relative error, but no more then one CPU 
wakeup period in absolute: for busy CPU (not skipping hardclock() ticks) 
it is 1/hz, for completely idle one it can be up to 0.5s. Second point 
allows to reduce processing overhead by the cost of error up to 1/hz for 
long periods (>(100/allowed)*(1/hz)), when it is used.

To get best possible precision kern.timecounter.alloweddeviation sysctl 
can be set to smaller value. Setting it to 0 will effectively disable 
all optimizations, but should give 50us precision in all cases.

>> for t in 1 300 3000 30000 300000 ; do
>>    for m in select poll usleep nanosleep kqueue kqueueto syscall ; do
>>      ./testsleep $t $m
>>    done
>> done
>>
>>
>> With calloutng_12_26.patch...
>>
>>                  HZ=100               HZ=250               HZ=1000
>> ---------- ----------------     ----------------     ----------------
>> select          1     55.79          1     50.96          1     61.32
>> poll            1   1109.46          1   1107.86          1   1114.38
>> usleep          1     56.33          1     72.90          1     62.78
>> nanosleep       1     52.66          1     55.23          1     64.23
>> kqueue          1   1114.23          1   1113.81          1   1121.21
>> kqueueto        1     65.44          1     71.00          1     75.01
>> syscall         1      4.70          1      4.45          1      4.55
>> select        300    355.79        300    357.76        300    362.35
>> poll          300   1107.85        300   1122.55        300   1115.62
>> usleep        300    355.28        300    357.28        300    360.79
>> nanosleep     300    354.49        300    355.82        300    360.62
>> kqueue        300   1112.57        300   1118.13        300   1117.16
>> kqueueto      300    375.98        300    378.62        300    395.61
>> syscall       300      4.41        300      4.45        300      4.54
>> select       3000   3246.75       3000   3246.74       3000   3252.72
>> poll         3000   3238.10       3000   3229.12       3000   3250.10
>> usleep       3000   3242.47       3000   3237.06       3000   3249.61
>> nanosleep    3000   3238.79       3000   3231.55       3000   3248.11
>> kqueue       3000   3240.01       3000   3236.07       3000   3247.60
>> kqueueto     3000   3265.36       3000   3267.22       3000   3274.96
>> syscall      3000      4.69       3000      4.44       3000      4.50
>> select      30000  31714.60      30000  31941.17      30000  32467.69
>> poll        30000  31522.76      30000  31983.00      30000  32497.81
>> usleep      30000  31459.67      30000  31980.76      30000  32458.71
>> nanosleep   30000  31431.02      30000  31982.22      30000  32525.20
>> kqueue      30000  31466.75      30000  31873.90      30000  31973.54
>> kqueueto    30000  31564.67      30000  32522.35      30000  32475.59
>> syscall     30000      4.70      30000      4.73      30000      4.89
>> select     300000 319133.02     300000 311562.33     300000 309918.62
>> poll       300000 319604.27     300000 311422.94     300000 310000.76
>> usleep     300000 319314.60     300000 311269.69     300000 309996.34
>> nanosleep  300000 319497.58     300000 311425.40     300000 309997.13
>> kqueue     300000 309995.55     300000 303980.27     300000 309908.82
>> kqueueto   300000 319505.88     300000 311424.97     300000 309996.16
>> syscall    300000      4.41     300000      4.45     300000      4.89
>>
>>
>> With no patches...
>>
>>                  HZ=100               HZ=250               HZ=1000
>> ---------- ----------------     ----------------     ----------------
>> select          1  19941.70          1   7989.10          1   1999.16
>> poll            1  19904.61          1   7987.32          1   1999.78
>> usleep          1  19904.95          1   7993.30          1   1999.96
>> nanosleep       1  19905.64          1   7993.71          1   1999.72
>> kqueue          1  10001.61          1   4004.00          1   1000.27
>> kqueueto        1  19904.00          1   7993.03          1   1999.54
>> syscall         1      4.04          1      4.05          1      4.75
>> select        300  19904.66        300   7998.39        300   2000.27
>> poll          300  19904.35        300   7993.47        300   1999.86
>> usleep        300  19903.96        300   7994.11        300   1999.81
>> nanosleep     300  19904.48        300   7993.77        300   1999.80
>> kqueue        300  10001.68        300   4004.18        300   1000.31
>> kqueueto      300  19997.86        300   7993.37        300   1999.59
>> syscall       300      4.01        300      4.00        300      4.32
>> select       3000  19904.80       3000   7998.85       3000   3998.43
>> poll         3000  19904.92       3000   8005.93       3000   3999.39
>> usleep       3000  19904.50       3000   7992.88       3000   3999.44
>> nanosleep    3000  19904.84       3000   7993.34       3000   3999.36
>> kqueue       3000  10001.58       3000   4003.97       3000   3000.72
>> kqueueto     3000  19903.56       3000   7993.24       3000   3999.34
>> syscall      3000      4.02       3000      4.37       3000      4.29
>> select      30000  39905.02      30000  35991.79      30000  31051.77
>> poll        30000  39905.49      30000  35980.35      30000  30995.64
>> usleep      30000  39903.78      30000  35979.48      30000  30995.23
>> nanosleep   30000  39904.55      30000  35981.61      30000  30995.87
>> kqueue      30000  30002.73      30000  32019.54      30000  30004.83
>> kqueueto    30000  39903.59      30000  35979.64      30000  30996.05
>> syscall     30000      4.44      30000      4.04      30000      4.31
>> select     300000 310001.23     300000 303995.86     300000 300994.30
>> poll       300000 309902.73     300000 303981.58     300000 300996.17
>> usleep     300000 309903.64     300000 303980.17     300000 300997.42
>> nanosleep  300000 309903.32     300000 303980.36     300000 300993.64
>> kqueue     300000 300002.77     300000 300019.46     300000 300006.90
>> kqueueto   300000 309903.31     300000 303978.10     300000 300996.84
>> syscall    300000      4.01     300000      4.04     300000      4.29


-- 
Alexander Motin
Received on Mon Dec 31 2012 - 09:17:40 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:33 UTC