Re: Process accounting/timing has broken recently

From: John Baldwin <jhb_at_freebsd.org> Date: Mon, 6 Dec 2010 09:44:03 -0500 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:09 UTC

On Sunday, December 05, 2010 6:18:29 pm Steve Kargl wrote:
> Sometime in the last 7-10 days, some one made a
> change that has broken process accounting/timing.
> 
> laptop:kargl[42] foreach i ( 0 1 2 3 4 5 6 7 8 9 )
> foreach? time ./testf
> foreach? end
> Max ULP: 0.501607 for x in [-18.000000:88.709999] with dx = 1.067100e-04
>        69.55 real        38.39 user        30.94 sys
> Max ULP: 0.501607 for x in [-18.000000:88.709999] with dx = 1.067100e-04
>        68.82 real        40.95 user        27.60 sys
> Max ULP: 0.501607 for x in [-18.000000:88.709999] with dx = 1.067100e-04
>        69.14 real        38.90 user        30.02 sys
> Max ULP: 0.501607 for x in [-18.000000:88.709999] with dx = 1.067100e-04
>        68.79 real        40.59 user        27.99 sys
> Max ULP: 0.501607 for x in [-18.000000:88.709999] with dx = 1.067100e-04
>        68.93 real        39.76 user        28.96 sys
> Max ULP: 0.501607 for x in [-18.000000:88.709999] with dx = 1.067100e-04
>        68.71 real        41.21 user        27.29 sys
> Max ULP: 0.501607 for x in [-18.000000:88.709999] with dx = 1.067100e-04
>        69.05 real        39.68 user        29.15 sys
> Max ULP: 0.501607 for x in [-18.000000:88.709999] with dx = 1.067100e-04
>        68.99 real        39.98 user        28.80 sys
> Max ULP: 0.501607 for x in [-18.000000:88.709999] with dx = 1.067100e-04
>        69.02 real        39.64 user        29.16 sys
> Max ULP: 0.501607 for x in [-18.000000:88.709999] with dx = 1.067100e-04
>        69.38 real        37.49 user        31.67 sys
> 
> testf is a numerically intensive program that tests the
> accuracy of expf() in a tight loop.  User time varies
> by ~3 seconds on my lightly loaded 2 GHz core2 duo processor.
> I'm fairly certain that the code does not suddenly grow/loose
> 6 GFLOP of operations.

The user/sys thing is a hack (and has been).  We sample the PC at stathz (~128 
hz) to figure out a user vs sys split and use that to divide up the total 
runtime (which actually is fairly accurate).  All you need is for the clock 
ticks to fire just a bit differently between runs to get a swing in user vs 
system time.

What I would like is to keep separate raw bintime's for user vs system time in 
the raw data instead, but that would involve checking the CPU ticker more 
often (e.g. twice for each syscall, interrupt, and trap in addition to the 
current once per context switch).  So far folks seem to be more worried about 
the extra overhead rather than the loss of accuracy.

-- 
John Baldwin