Re: clock error: callouts are run one tick late

From: Luigi Rizzo <rizzo_at_iet.unipi.it>
Date: Wed, 9 Sep 2009 03:16:06 +0200
On Wed, Sep 09, 2009 at 03:01:37AM +0200, Luigi Rizzo wrote:
> RELENG_8/amd64 (can not try on i386) has the following problem:
> 
> 	callout_reset(..., t, ...)
> 
> processes the callout after t+1 ticks instead of the t ticks
> that one sees on RELENG_7 and earlier.
> 
> I found it by pure chance, noticing that dummynet on RELENG_8
> has a jitter that is two ticks instead of one tick.
> Other systems with rely on frequent callouts might see
> problems as well.
> 
> An indirect way to see the problem is the following:
> 
> 	kldload dummynet
> 
> 	sysctl net.inet.ip.dummynet.tick_adjustment; \
> 	sleep 1; sysctl net.inet.ip.dummynet.tick_adjustment
> 
> on a working system, the variable should remain mostly unchanged;
> on a non-working system you see it growing at a rate HZ/2
> 
> I suspect the bug is introduced by the change in kern_timeout.c rev. 1.114
> 
> http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/kern/kern_timeout.c.diff?r1=1.113;r2=1.114
> 
> which changes softclock() to stop before the current 'ticks'
> so processes everything one tick late.
> 
> I understand the race described in the cvs log, but this does
> not seem a proper fix -- it violates POLA by changing the semantics
> of callout_reset(), and does not really fix the race, but just adds
> an extra uncertainty of 1 tick on when a given callout will be run
> 
> If the problem is a race between hardclock() which updates 'ticks',
> and the various hardclock_cpu() instances which might arrive early,
> I would suggest two alternative options:
> 
> 1. create a per-cpu 'ticks' (say a field cc_ticks in struct callout_cpu),
>    increment it at the start of hardclock_cpu, and use cc->ticks
>    instead of 'ticks' in the various callout functions involved
>    with manipulation of the callwheel
>    callout_tick(), softclock(), callout_reset_on()
> 
> 2. start softclock() at cc->cc_softticks -1, i.e.
> 
> 	...
> 	CC_LOCK(cc)
>    -	while (cc->cc_softticks != ticks) {
>    +	while (cc->cc_softticks-1 != ticks) {
>  	...

#2 also need this change in callout_tick()

        mtx_lock_spin_flags(&cc->cc_lock, MTX_QUIET);
     -  for (; (cc->cc_softticks - ticks) < 0; cc->cc_softticks++) {
     +  for (; (cc->cc_softticks - ticks) <= 0; cc->cc_softticks++) {
                bucket = cc->cc_softticks & callwheelmask;

Just tested it, it seems to fix the problem.

cheers
luigi
Received on Tue Sep 08 2009 - 23:10:10 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:55 UTC