[RFC/RFT] calloutng

From: Davide Italiano <davide_at_freebsd.org>
Date: Fri, 14 Dec 2012 00:12:47 +0100
Hi.
This patch takes callout(9) and redesign the KPI and the
implementation. The main objective of this work is making the
subsystem tickless.  In the last several years, this possibility has
been discussed widely (http://markmail.org/message/q3xmr2ttlzpqkmae),
but until now noone really implemented that.
If you want a complete history of what has been done in the last
months you can check the calloutng project repository
http://svnweb.freebsd.org/base/projects/calloutng/
For lazy people, here's a summary:
1) callout(9) is not anymore constrained to the resolution a periodic
"hz" clock can give. In order to do that, the eventtimers(4) subsystem
is used as backend.
2) Conversely from what discussed in past, we maintained the callwheel
as underlying data structure for keeping track of the outstading
timeouts. This choice has a couple of advantages, in particular we can
still take benefits from the O(1) average complexity of the wheel for
all the operations. Also, we thought the code duplication that would
arise from the use of a two-staged backend for callout (e.g. use wheel
for coarse resolution event and another data structure, such as an
heap for high resolution events), is unacceptable. In fact, as long as
callout gained the ability to migrate from a cpu to another having a
double backend would mean doubling the code for the migration path.
3) A way to dispatch interrupts from hardware interrupt context has
been implemented, using special callout flag. This has limited
applicability, but avoid the dispatching of a SWI thread for handling
specific callouts, avoiding the wake up of another CPU for processing
and a (relatively useless) context switch
4) As long as new callout mechanism deals with bintime and not anymore
with ticks, time is specified as absolute and not relative anymore. In
order to get current time binuptime() or getbinuptime() is used, and a
sysctl is introduced to selectively choose the function to use, based
on a precision threshold.
5) A mechanism for specifying precision tolerance has been
implemented. The callout processing mechanism has been adapted and the
callout data structure augmented so that the codepath can take
advantage and aggregate events which overlap in time.


The new proposed KPI for callout is the following:
callout_reset_bt_on(..., struct bintime time, struct bintime pr, ..., int flags)
where ‘time’ argument represets the time at which the callout should
fire, ‘pr’ represents the precision tolerance expressed as an absolute
value, and ‘flags’, which could be used to specify new features, i.e.
for now, the possibility to run the callout from fast interrupt
context.
The old KPI has been extended introducing the callout_reset_flags()
function, which is the same of callout_reset*(), but takes an
additional argument ‘int flags’ that can be used in the same fashion
of the ‘flags’ argument for the new KPI. Using the ‘flags’ consumers
can also specify relative precision tolerance in terms of power-of-two
portion of the timeout passed as ticks.
Using this strategy, the new precision mechanism can be used for the
existing services without major modifications.

Some consumers have been ported to the new KPI, in particular
nanosleep(), poll(), select(), because they take immediate advantage
from the arbitrary precision offered by the new infrastructure.
For some statistics about the outcome of the conversion to the new
service, please refer to the end of this e-mail:
http://lists.freebsd.org/pipermail/freebsd-arch/2012-July/012756.html
We didn't measure any significant performance regressions with
hwmpc(4), using some benckmarks programs:
http://people.freebsd.org/~davide/poll_test/poll_test.c
http://people.freebsd.org/~mav/testsleep.c
http://people.freebsd.org/~mav/testidle.c

We tested the code on amd64, MIPS and arm. Any kind of testing or
comment would be really appreciated. The full diff of the work against
HEAD can be found at: http://people.freebsd.org/~davide/calloutng.diff
If noone have objections, we plan to merge the repository to HEAD in a
week or so.

Thanks,

Davide
Received on Thu Dec 13 2012 - 22:12:49 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:33 UTC