A question about statclock and stathz

From: Thomas Sparrevohn <Thomas.Sparrevohn_at_btinternet.com> Date: Wed, 13 Mar 2019 14:47:59 -0000 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:20 UTC

A little while ago I decided to write a little program with the aim of
analysing the historical memory usage of my FreeBSD system based on the 

system account files in the /var/account. While the reasons for this are
obscure and  irrelevant. I was somewhat surprised to discover that about 43%
of the records had no memory usage registered. Initially I through it was an
error in my parser (writhing in Haskell) but it turned out not to be the
case. Normal "sa -u" output gives the same result. I started digging into it
and found out it quite easy to reproduce - e.g. run a "repeat 1000
/usr/bin/time -lh ls" will show 0 in most but not all output.

To avoid any misunderstandings as far as I can see the observations below
only impacts data presented to the end-user (through acct(2), getrusage())
and I don't think there is any issues in the kernels internal resource
accounting (at least not that I can see)

The reason turns out to be that around 43% of the commends (make buildworld
buildkernel) simply finished before statclock() runs for the first time.  

I started digging into it found out what the problem is. Historically
(basing this on the "Design and Implementation of The FreeBSD Operating
System C3.4, P57-59" covering V5.2 shows that on  a system with a assumed
clock of a 100 ticks per second - the assumption was that statclock() would
run at 128 ticks per second. E.g. 28% faster than hz. 

The current kernel seems to use 128hz or there about if using the
kern.eventtimer.periodic=1 (133 vs. 127) independent of what the scheduling
clock is doing. 

Acct(2) bases the information on the rsuage struct  td_ru who's Memory
information is updated in statclock() which in turn also is used by
getrusage and hence /usr/bin/time shows the same issue.

With periodic set the kernel shows

kern.sched.quantum: 97734

kern.clockrate: { hz = 1000, tick = 1000, profhz = 8112, stathz = 133 }
(stathz = 127, if the periodic flag is set to 0)

kern.eventtimer.periodic: 1

kern.eventtimer.timer: HPET

kern.eventtimer.idletick: 0

kern.eventtimer.singlemul: 2

kern.eventtimer.choice: HPET(350) HPET1(340) HPET2(340) HPET3(340)
LAPIC(100) i8254(100) RTC(0)

kern.eventtimer.et.HPET3.quality: 340

kern.eventtimer.et.HPET3.frequency: 14318180

kern.eventtimer.et.HPET3.flags: 3

kern.eventtimer.et.HPET2.quality: 340

kern.eventtimer.et.HPET2.frequency: 14318180

kern.eventtimer.et.HPET2.flags: 3

kern.eventtimer.et.HPET1.quality: 340

kern.eventtimer.et.HPET1.frequency: 14318180

kern.eventtimer.et.HPET1.flags: 3

kern.eventtimer.et.HPET.quality: 350

kern.eventtimer.et.HPET.frequency: 14318180

kern.eventtimer.et.HPET.flags: 3

kern.eventtimer.et.RTC.quality: 0

kern.eventtimer.et.RTC.frequency: 32768

kern.eventtimer.et.RTC.flags: 17

kern.eventtimer.et.i8254.quality: 100

kern.eventtimer.et.i8254.frequency: 1193182

kern.eventtimer.et.i8254.flags: 1

kern.eventtimer.et.LAPIC.quality: 100

kern.eventtimer.et.LAPIC.frequency: 0

kern.eventtimer.et.LAPIC.flags: 15

Whereas I would have expected the stathz to be around 1280.  I don't know if
anybody actually uses acct(2) and getrusage(2) for anything

but the impact are interesting (done whole heap of analysis on it in R )

This shows the proportion of commands on 3 different "make buildworld
buildkernel" at the system running with default values as should be clear
there are 46% of all commands that terminates before sysclock() get to run -
to be clear the isnull variable is true where no memory

Data is in the "sa -u" output

w128hz$isnull 

       n  missing distinct 

  645822        0        2 

Value       FALSE   TRUE

Frequency  343806 302016

Proportion  0.532  0.468

With the kern.eventtimer.periodic = 1 is shows the same proportion (but in
this case just a single "make buildworld buildkernel")

w133hz$isnull 

       n  missing distinct 

  202309        0        2 

Value       FALSE   TRUE

Frequency  103799  98510

Proportion  0.513  0.487

If this is changes to mirror the behaviour described in the book the
sampling error or what we should call it falls to 4-5% rather 

Than 43-48% 

w1280hz$isnull 

       n  missing distinct 

  404128        0        2 

Value       FALSE   TRUE

Frequency  385411  18717

Proportion  0.954  0.046

Fiddling with hz and stathz you can get it <1% (hz=3000, stathz=3840) but
that seems silly. <5% seems a sampling error 

margin seems acceptable. Either way >43% incorrectly showing no memory usage
should atleast have a warning in the

getrsuage(2) and acct(2) man pages if the decision is to keep the current
128hz statclock behaviour

I have made a patch that changes the behaviour (kern_clocksource.c) but I am
not sure if the edge case a caught here also

Applies to the last update in kern_exit.c (I don't believe so). I have not
looked at the accuracy of the other fields in strut rusage 

But there seem more sensible that the memory values was. The thing that nags
me is that I have a recollection that this was discussed 

ages ago but given I have been FreeBSDing since V1 - I cannot remember when
it was and some argument was made that anything above 128hz was wasteful and
suboptimal. The issue with changing it is that if anybody actually uses
getrusage and acct for anything it would change behaviour and potentially be
a POLA  - so maybe some kind of flag.  The more fundamental question is
whether the statclock approach is correct as such. 

Sorry for the long mail.