Re: Process accounting/timing has broken recently

From: David Xu <davidxu_at_freebsd.org> Date: Thu, 09 Dec 2010 10:22:28 +0800 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:10 UTC

John Baldwin wrote:
> On Tuesday, December 07, 2010 9:54:05 pm David Xu wrote:
>> John Baldwin wrote:
>>> On Monday, December 06, 2010 7:11:28 pm David Xu wrote:
>>>> John Baldwin wrote:
>>>>> On Sunday, December 05, 2010 6:18:29 pm Steve Kargl wrote:
>>>>>   
>>>>>> Sometime in the last 7-10 days, some one made a
>>>>>> change that has broken process accounting/timing.
>>>>>>
>>>>>> laptop:kargl[42] foreach i ( 0 1 2 3 4 5 6 7 8 9 )
>>>>>> foreach? time ./testf
>>>>>> foreach? end
>>>>>> Max ULP: 0.501607 for x in [-18.000000:88.709999] with dx = 1.067100e-04
>>>>>>        69.55 real        38.39 user        30.94 sys
>>>>>> Max ULP: 0.501607 for x in [-18.000000:88.709999] with dx = 1.067100e-04
>>>>>>        68.82 real        40.95 user        27.60 sys
>>>>>> Max ULP: 0.501607 for x in [-18.000000:88.709999] with dx = 1.067100e-04
>>>>>>        69.14 real        38.90 user        30.02 sys
>>>>>> Max ULP: 0.501607 for x in [-18.000000:88.709999] with dx = 1.067100e-04
>>>>>>        68.79 real        40.59 user        27.99 sys
>>>>>> Max ULP: 0.501607 for x in [-18.000000:88.709999] with dx = 1.067100e-04
>>>>>>        68.93 real        39.76 user        28.96 sys
>>>>>> Max ULP: 0.501607 for x in [-18.000000:88.709999] with dx = 1.067100e-04
>>>>>>        68.71 real        41.21 user        27.29 sys
>>>>>> Max ULP: 0.501607 for x in [-18.000000:88.709999] with dx = 1.067100e-04
>>>>>>        69.05 real        39.68 user        29.15 sys
>>>>>> Max ULP: 0.501607 for x in [-18.000000:88.709999] with dx = 1.067100e-04
>>>>>>        68.99 real        39.98 user        28.80 sys
>>>>>> Max ULP: 0.501607 for x in [-18.000000:88.709999] with dx = 1.067100e-04
>>>>>>        69.02 real        39.64 user        29.16 sys
>>>>>> Max ULP: 0.501607 for x in [-18.000000:88.709999] with dx = 1.067100e-04
>>>>>>        69.38 real        37.49 user        31.67 sys
>>>>>>
>>>>>> testf is a numerically intensive program that tests the
>>>>>> accuracy of expf() in a tight loop.  User time varies
>>>>>> by ~3 seconds on my lightly loaded 2 GHz core2 duo processor.
>>>>>> I'm fairly certain that the code does not suddenly grow/loose
>>>>>> 6 GFLOP of operations.
>>>>>>     
>>>>> The user/sys thing is a hack (and has been).  We sample the PC at stathz (~128 
>>>>> hz) to figure out a user vs sys split and use that to divide up the total 
>>>>> runtime (which actually is fairly accurate).  All you need is for the clock 
>>>>> ticks to fire just a bit differently between runs to get a swing in user vs 
>>>>> system time.
>>>>>
>>>>> What I would like is to keep separate raw bintime's for user vs system time in 
>>>>> the raw data instead, but that would involve checking the CPU ticker more 
>>>>> often (e.g. twice for each syscall, interrupt, and trap in addition to the 
>>>>> current once per context switch).  So far folks seem to be more worried about 
>>>>> the extra overhead rather than the loss of accuracy.
>>>>>
>>>>>   
>>>> Adding any instruction into global syscall path should be cautioned, it
>>>> is worse then before, thinking about a threaded application, a userland
>>>> thread may have locked a mutex and calls a system call, the overhead
>>>> added to system call path can directly affect a threaded application's
>>>> performance now, because the time window the mutex is held
>>>> is longer than before, I have seen some people likes to fiddle with
>>>> system call path, it should be cautioned.
>>> OTOH, the current getrusage(2) stats cannot be trusted.  The only meaningful
>>> thing you can do is to sum them since the total is known to be accurate at
>>> least.
>>>
>>> If it wouldn't make things so messy I'd consider a new kernel option
>>> 'ACCURATE_RUSAGE' or some such.
>>>
>> Our getrusage is already very slow, everytime, it needs to
>> iterate the threads list with a process SLOCK held. I saw some mysql
>> versions heavily use getrusage, and a horribly slow. I think a
>> ACCURATE_RUSAGE will make it worse ?
> 
> Using 'time foo' only retrieves the usage once at the end via wait().
> 
> However, FWIW, I think ACCURATE_RUSAGE would simplify calcru1() quite a bit
> since you would just sum up the thread's usage (as we do know), but then do a
> direct conversion from bintime to timeval for user and system without ever
> having to worry about time going backwards, etc.  All the hacks to enforce
> monotonicity that are currently in place would go away since we would have
> the real measurements and not be inferring them from statclock ticks.
> 
I don't worry getrusage, though it is not necessary that it is used
with wait(), I saw mysql used it to measure its internal code's
performance, it did not use it with wait(), it is questionable
if this usage is correct.
However, I worry anyone who adds overhead to system call path.