Re: [RFC/RFT] calloutng

From: Luigi Rizzo <rizzo_at_iet.unipi.it>
Date: Wed, 2 Jan 2013 11:57:30 +0100
On Mon, Dec 31, 2012 at 12:17:27PM +0200, Alexander Motin wrote:
> On 31.12.2012 08:17, Luigi Rizzo wrote:
> >On Sun, Dec 30, 2012 at 04:13:43PM -0700, Ian Lepore wrote:
...
> >>Then I noticed you had a 12_26 patchset so I tested
> >>that (after crudely fixing a couple uninitialized var warnings), and it
> >>all looks good on this arm (Raspberry Pi).  I'll attach the results.
> >>
> >>It's so sweet to be able to do precision sleeps.
> 
> Thank you for testing, Ian.
> 
> >interesting numbers, but there seems to be some problem in computing
> >the exact interval; delays are much larger than expected.
> >
> >In this test, the original timer code used to round to the next multiple
> >of 1 tick and then add another tick (except for the kqueue case),
> >which is exactly what you see in the second set of measurements.
> >
> >The calloutng code however seems to do something odd:
> >in addition to fixed overhead (some 50us, which you can see in
> >the tests for 1us and 300us), all delay seem to be ~10% larger
> >than what is requested, upper bounded to 10ms (note, the
> >numbers are averages so i cannot tell whether all samples are
> >the same or there is some distribution of values).
> >
> >I am not sure if this error is peculiar of the ARM version or also
> >appears on x86/amd64 but I believe it should be fixed.
> >
> >If you look at the results below:
> >
> >1us 	possily ok:
> >	for very short intervals i would expect some kind
> >	of 'reschedule' without actually firing a timer; maybe
> >	50us are what it takes to do a round through the scheduler ?
> >
> >300us	probably ok
> >	i guess the extra 50-90us are what it takes to do a round
> >	through the scheduler
> >
> >1000us	borderline (this is the case for poll and kqueue, which are
> >	rounded to 1ms)
> >	here intervals seem to be increased by 10%, and i cannot see
> >	a good reason for this (more below).
> >
> >3000us and above: wrong
> >	here again, the intervals seem to be 10% larger than what is
> >	requested, perhaps limiting the error to 10-20ms.
> >
> >
> >Maybe the 10% extension results from creating a default 'precision'
> >for legacy calls, but i do not think this is done correctly.
> >
> >First of all, if users do not specify a precision themselves, the
> >automatically generated value should never exceed one tick.
> >
> >Second, the only point of a 'precision' parameter is to merge
> >requests that may be close in time, so if there is already a
> >timer scheduled within [Treq, Treq+precision] i will get it;
> >but if there no pending timer, then one should schedule it
> >for the requested interval.
> >
> >Davide/Alexander, any ideas ?
> 
> All mentioned effects could be explained with implemented logic. 50us at 
> 1us is probably sum of minimal latency of the hardware eventtimer on the 
> specific platform and some software processing overhead (syscall, 
> callout, timecouters, scheduler, etc). At later points system starts to 
> noticeably use precision specified by kern.timecounter.alloweddeviation 
> sysctl. It affects results from two sides: 1) extending intervals for 
> specified percent of time to allow event aggregation, and 2) choosing 
> time base between fast getbinuptime() and precise binuptime(). Extending 
> interval is needed to aggregate not only callouts with each other, but 
> also callouts with other system events, which are impossible to schedule 
> in advance. It gives specified relative error, but no more then one CPU 
> wakeup period in absolute: for busy CPU (not skipping hardclock() ticks) 
> it is 1/hz, for completely idle one it can be up to 0.5s. Second point 
> allows to reduce processing overhead by the cost of error up to 1/hz for 
> long periods (>(100/allowed)*(1/hz)), when it is used.

i am not sure what you mean by "extending interval", but i believe the
logic should be the following:

- say user requests a timeout after X seconds and with a tolerance of D second
  (both X and D are fractional, so they can be short).  Interpret this as

   "the system should do its best to generate an event between X and X+D seconds"

- convert X to an absolute time, T_X

- if there are any pending events already scheduled between T_X and T_X+D,
  then by definition they are acceptable. Attach the requested timeout
  to the earliest of these events.

- otherwise, schedule an event at time T_X (because there is no valid
  reason to generate a late event, and it makes no sense from an
  energy saving standpoint, either -- see below).

It seems to me that you are instead extending the requested interval
upfront, which causes some gratuitous pessimizations in scheduling
the callout.

Re. energy savings: the gain in extending the timeout cannot exceed
the value D/X. So while it may make sense to extend a 1us request
to 50us to go (theoretically) from 1M callouts/s to 20K callouts/s,
it is completely pointless from an energy saving standpoint to
introduce a 10ms error on a 300ms request.

(even though i hate the idea that a 1us request defaults to
a 50us delay; but that is hopefully something that can be tuned
in a platform-specific way and perhaps at runtime).

cheers
luigi

> To get best possible precision kern.timecounter.alloweddeviation sysctl 
> can be set to smaller value. Setting it to 0 will effectively disable 
> all optimizations, but should give 50us precision in all cases.
> 
> >>for t in 1 300 3000 30000 300000 ; do
> >>   for m in select poll usleep nanosleep kqueue kqueueto syscall ; do
> >>     ./testsleep $t $m
> >>   done
> >>done
> >>
> >>
> >>With calloutng_12_26.patch...
> >>
> >>                 HZ=100               HZ=250               HZ=1000
> >>---------- ----------------     ----------------     ----------------
> >>select          1     55.79          1     50.96          1     61.32
> >>poll            1   1109.46          1   1107.86          1   1114.38
> >>usleep          1     56.33          1     72.90          1     62.78
> >>nanosleep       1     52.66          1     55.23          1     64.23
> >>kqueue          1   1114.23          1   1113.81          1   1121.21
> >>kqueueto        1     65.44          1     71.00          1     75.01
> >>syscall         1      4.70          1      4.45          1      4.55
> >>select        300    355.79        300    357.76        300    362.35
> >>poll          300   1107.85        300   1122.55        300   1115.62
> >>usleep        300    355.28        300    357.28        300    360.79
> >>nanosleep     300    354.49        300    355.82        300    360.62
> >>kqueue        300   1112.57        300   1118.13        300   1117.16
> >>kqueueto      300    375.98        300    378.62        300    395.61
> >>syscall       300      4.41        300      4.45        300      4.54
> >>select       3000   3246.75       3000   3246.74       3000   3252.72
> >>poll         3000   3238.10       3000   3229.12       3000   3250.10
> >>usleep       3000   3242.47       3000   3237.06       3000   3249.61
> >>nanosleep    3000   3238.79       3000   3231.55       3000   3248.11
> >>kqueue       3000   3240.01       3000   3236.07       3000   3247.60
> >>kqueueto     3000   3265.36       3000   3267.22       3000   3274.96
> >>syscall      3000      4.69       3000      4.44       3000      4.50
> >>select      30000  31714.60      30000  31941.17      30000  32467.69
> >>poll        30000  31522.76      30000  31983.00      30000  32497.81
> >>usleep      30000  31459.67      30000  31980.76      30000  32458.71
> >>nanosleep   30000  31431.02      30000  31982.22      30000  32525.20
> >>kqueue      30000  31466.75      30000  31873.90      30000  31973.54
> >>kqueueto    30000  31564.67      30000  32522.35      30000  32475.59
> >>syscall     30000      4.70      30000      4.73      30000      4.89
> >>select     300000 319133.02     300000 311562.33     300000 309918.62
> >>poll       300000 319604.27     300000 311422.94     300000 310000.76
> >>usleep     300000 319314.60     300000 311269.69     300000 309996.34
> >>nanosleep  300000 319497.58     300000 311425.40     300000 309997.13
> >>kqueue     300000 309995.55     300000 303980.27     300000 309908.82
> >>kqueueto   300000 319505.88     300000 311424.97     300000 309996.16
> >>syscall    300000      4.41     300000      4.45     300000      4.89
> >>
> >>
> >>With no patches...
> >>
> >>                 HZ=100               HZ=250               HZ=1000
> >>---------- ----------------     ----------------     ----------------
> >>select          1  19941.70          1   7989.10          1   1999.16
> >>poll            1  19904.61          1   7987.32          1   1999.78
> >>usleep          1  19904.95          1   7993.30          1   1999.96
> >>nanosleep       1  19905.64          1   7993.71          1   1999.72
> >>kqueue          1  10001.61          1   4004.00          1   1000.27
> >>kqueueto        1  19904.00          1   7993.03          1   1999.54
> >>syscall         1      4.04          1      4.05          1      4.75
> >>select        300  19904.66        300   7998.39        300   2000.27
> >>poll          300  19904.35        300   7993.47        300   1999.86
> >>usleep        300  19903.96        300   7994.11        300   1999.81
> >>nanosleep     300  19904.48        300   7993.77        300   1999.80
> >>kqueue        300  10001.68        300   4004.18        300   1000.31
> >>kqueueto      300  19997.86        300   7993.37        300   1999.59
> >>syscall       300      4.01        300      4.00        300      4.32
> >>select       3000  19904.80       3000   7998.85       3000   3998.43
> >>poll         3000  19904.92       3000   8005.93       3000   3999.39
> >>usleep       3000  19904.50       3000   7992.88       3000   3999.44
> >>nanosleep    3000  19904.84       3000   7993.34       3000   3999.36
> >>kqueue       3000  10001.58       3000   4003.97       3000   3000.72
> >>kqueueto     3000  19903.56       3000   7993.24       3000   3999.34
> >>syscall      3000      4.02       3000      4.37       3000      4.29
> >>select      30000  39905.02      30000  35991.79      30000  31051.77
> >>poll        30000  39905.49      30000  35980.35      30000  30995.64
> >>usleep      30000  39903.78      30000  35979.48      30000  30995.23
> >>nanosleep   30000  39904.55      30000  35981.61      30000  30995.87
> >>kqueue      30000  30002.73      30000  32019.54      30000  30004.83
> >>kqueueto    30000  39903.59      30000  35979.64      30000  30996.05
> >>syscall     30000      4.44      30000      4.04      30000      4.31
> >>select     300000 310001.23     300000 303995.86     300000 300994.30
> >>poll       300000 309902.73     300000 303981.58     300000 300996.17
> >>usleep     300000 309903.64     300000 303980.17     300000 300997.42
> >>nanosleep  300000 309903.32     300000 303980.36     300000 300993.64
> >>kqueue     300000 300002.77     300000 300019.46     300000 300006.90
> >>kqueueto   300000 309903.31     300000 303978.10     300000 300996.84
> >>syscall    300000      4.01     300000      4.04     300000      4.29
> 
> 
> -- 
> Alexander Motin
Received on Wed Jan 02 2013 - 09:58:35 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:33 UTC