On Fri, Dec 14, 2012 at 7:41 AM, Luigi Rizzo <rizzo_at_iet.unipi.it> wrote: > > On Fri, Dec 14, 2012 at 12:12 AM, Davide Italiano <davide_at_freebsd.org> > wrote: >> >> Hi. >> This patch takes callout(9) and redesign the KPI and the >> implementation. The main objective of this work is making the >> subsystem tickless. In the last several years, this possibility has >> been discussed widely (http://markmail.org/message/q3xmr2ttlzpqkmae), >> but until now noone really implemented that. >> If you want a complete history of what has been done in the last >> months you can check the calloutng project repository >> http://svnweb.freebsd.org/base/projects/calloutng/ >> For lazy people, here's a summary: > > > thanks for the work and the detailed summary. > Perhaps it would be useful if you could provide a few high level > details on the use and performance of the new scheme, such as: > > - is the old callout KPI still available ? (i am asking because it would > help maintaining third party kernel modules that are expected to > work on different FreeBSD releases) > Obviously the old KPI is still available. callout(9) is a very popular interface and I don't think removing the old interface is a good idea, because could make unhappy some vendor when its code doesn't build anymore on FreeBSD. > - do you have numbers on what is the fastest rate at which callouts > can be fired (e.g. say you have a callout which increments a > counter and schedules the next callout in (struct bintime){0,1} ) ? > > > - is there a possibility that if callout requests are too close to each > other (e.g. the above test) the thread dispatching callouts will > run forever ? if so, is there a way to make such thread yield > after a while ? > > - since you mentioned nanosleep() poll() and select() have been > ported to the new callout, is there a way to guarantee that user > using these functions with a very short timeout are actually > descheduled as opposed to "interval too short, don't bother" ? > > - do you have numbers on how many calls per second we can > have for a process that does > for (;;) { nanosleep(min_value_that_causes_descheduling); > > I also have some comments on the diff: > - can you provide a diff -p ? > > - for several functions the only change is the name of an argument > from "busy" to "us". Can you elaborate the reason for the change, > and whether "us" means microseconds or the pronoun ?) > Please see r242905 by mav_at_. > Finally, a more substantial comment: > - a lot of functions which formerly had only a "timo" argument > now have "timo, bt, precision, flags". Take seltdwait() as an example. > seltdwait() is not part of the public KPI. It has been modified to avoid code duplication. Having seltdwait() and seltdwait_bt(), i.e. two separate functions, even though we could share most of the code is not a clever approach, IMHO. As I told before, seltdwait() is not exposed so we can modify its argument without breaking anything. > It seems that you have been undecided between two approaches: > for some of these functions you have preserved the original function > that deals with ticks and introduced a new one that deals with the > bintime, > whereas in other cases you have modified the original function to add > "bt, precision, flags". > I'm not. All the functions which are part of the public KPI (e.g. condvar(9), sleepq(9), *) are still available. *_flags variants have been introduced so that consumers can take advantage of the new 'precision tolerance mechanism' implemented. Also, *_bt variants have been introduced. I don't see any "undecision" between the two approaches. Please note that now the callout backend deals with bintime, so every time callout_reset_on() is called, the 'tick' argument passed is silently converted to bintime. > I would suggest a more uniform approach, namely: > - preserve all the existing functions (T) that take a timeout in ticks; > - add a new set of corresponding functions (BT) that take > bt, precision, flags _instead_ of the ticks > - the functions (T) make immediately the conversion from ticks to > bintime(s), using macros or inline > - optionally, convert kernel components to the new (BT) functions > where this makes sense (e.g. we can exploit the finer-granularity > of the new calls, etc.) > > cheers > luigi > > 1) callout(9) is not anymore constrained to the resolution a periodic >> >> "hz" clock can give. In order to do that, the eventtimers(4) subsystem >> is used as backend. >> 2) Conversely from what discussed in past, we maintained the callwheel >> as underlying data structure for keeping track of the outstading >> timeouts. This choice has a couple of advantages, in particular we can >> still take benefits from the O(1) average complexity of the wheel for >> all the operations. Also, we thought the code duplication that would >> arise from the use of a two-staged backend for callout (e.g. use wheel >> for coarse resolution event and another data structure, such as an >> heap for high resolution events), is unacceptable. In fact, as long as >> callout gained the ability to migrate from a cpu to another having a >> double backend would mean doubling the code for the migration path. >> 3) A way to dispatch interrupts from hardware interrupt context has >> been implemented, using special callout flag. This has limited >> applicability, but avoid the dispatching of a SWI thread for handling >> specific callouts, avoiding the wake up of another CPU for processing >> and a (relatively useless) context switch >> 4) As long as new callout mechanism deals with bintime and not anymore >> with ticks, time is specified as absolute and not relative anymore. In >> order to get current time binuptime() or getbinuptime() is used, and a >> sysctl is introduced to selectively choose the function to use, based >> on a precision threshold. >> 5) A mechanism for specifying precision tolerance has been >> implemented. The callout processing mechanism has been adapted and the >> callout data structure augmented so that the codepath can take >> advantage and aggregate events which overlap in time. >> >> >> The new proposed KPI for callout is the following: >> callout_reset_bt_on(..., struct bintime time, struct bintime pr, ..., int >> flags) >> where ‘time’ argument represets the time at which the callout should >> fire, ‘pr’ represents the precision tolerance expressed as an absolute >> value, and ‘flags’, which could be used to specify new features, i.e. >> for now, the possibility to run the callout from fast interrupt >> context. >> The old KPI has been extended introducing the callout_reset_flags() >> function, which is the same of callout_reset*(), but takes an >> additional argument ‘int flags’ that can be used in the same fashion >> of the ‘flags’ argument for the new KPI. Using the ‘flags’ consumers >> can also specify relative precision tolerance in terms of power-of-two >> portion of the timeout passed as ticks. >> Using this strategy, the new precision mechanism can be used for the >> existing services without major modifications. >> >> Some consumers have been ported to the new KPI, in particular >> nanosleep(), poll(), select(), because they take immediate advantage >> from the arbitrary precision offered by the new infrastructure. >> For some statistics about the outcome of the conversion to the new >> service, please refer to the end of this e-mail: >> http://lists.freebsd.org/pipermail/freebsd-arch/2012-July/012756.html >> We didn't measure any significant performance regressions with >> hwmpc(4), using some benckmarks programs: >> http://people.freebsd.org/~davide/poll_test/poll_test.c >> http://people.freebsd.org/~mav/testsleep.c >> http://people.freebsd.org/~mav/testidle.c >> >> We tested the code on amd64, MIPS and arm. Any kind of testing or >> comment would be really appreciated. The full diff of the work against >> HEAD can be found at: http://people.freebsd.org/~davide/calloutng.diff >> If noone have objections, we plan to merge the repository to HEAD in a >> week or so. >> >> Thanks, >> >> Davide >> _______________________________________________ >> freebsd-current_at_freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-current >> To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org" > > > > > -- > -----------------------------------------+------------------------------- > Prof. Luigi RIZZO, rizzo_at_iet.unipi.it . Dip. di Ing. dell'Informazione > http://www.iet.unipi.it/~luigi/ . Universita` di Pisa > TEL +39-050-2211611 . via Diotisalvi 2 > Mobile +39-338-6809875 . 56122 PISA (Italy) > -----------------------------------------+------------------------------- >Received on Fri Dec 14 2012 - 11:57:44 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:33 UTC