Re: host, bhyve vm and ntpd

From: Ian Lepore <ian_at_freebsd.org>
Date: Fri, 20 Oct 2017 12:04:27 -0600
On Fri, 2017-10-20 at 20:20 +0300, Boris Samorodov wrote:
> (CC to freebsd-virtualization_at_)
> 
> 20.10.2017 19:32, Ian Lepore ΠΙΫΕΤ:
> > 
> > On Fri, 2017-10-20 at 18:36 +0300, Boris Samorodov wrote:
> > > 
> > > 20.10.2017 18:31, Boris Samorodov ΠΙΫΕΤ:
> > > > 
> > > > 
> > > > 20.10.2017 18:12, Ian Lepore ΠΙΫΕΤ:
> > > > > 
> > > > > 
> > > > > On Fri, 2017-10-20 at 14:46 +0300, Boris Samorodov wrote:
> > > > > > 
> > > > > > 
> > > > > > Hi All,
> > > > > > 
> > > > > > I have got a host:
> > > > > > ---
> > > > > > bhyve-host% uname -a
> > > > > > FreeBSD sm.bsnet 12.0-CURRENT FreeBSD 12.0-CURRENT #3 r322868: Fri Aug
> > > > > > 25 05:25:26 MSK 2017
> > > > > > bsam_at_builder.bsnet:/usr/obj/usr/src/sys/GENERIC-FASTššamd64 amd64
> > > > > > ---
> > > > > > 
> > > > > > And a bhyve vm:
> > > > > > ---
> > > > > > bhyve-vm: uname -a
> > > > > > FreeBSD builder.bsnet 12.0-CURRENT FreeBSD 12.0-CURRENT #58 r324782: Fri
> > > > > > Oct 20 05:12:17 MSK 2017
> > > > > > bsam_at_builder.bsnet:/usr/obj/usr/src/sys/PKG64Xššamd64 amd64
> > > > > > ---
> > > > > > 
> > > > > > The only difference at kernel configs is a colored console. :-)
> > > > > > 
> > > > > > And here I get some weird (is it?) result at the VM (I expect ntpd to be
> > > > > > more stable):
> > > > > > ---
> > > > > > bhyve-vm% for t in `jot 10`; do ntpq -pn; sleep 64; done
> > > > > > šššššremotešššššššššššrefidššššššst t when poll reachšššdelayšššoffset
> > > > > > jitter
> > > > > > ==============================================================================
> > > > > > šXX.XX.XX.1ššššššXX.XX.XX.245ššššš4 ušššš9ššš64šššš3šššš0.605ššš-1.202
> > > > > > 316.407
> > > > > > šXX.XX.XX.1ššššššXX.XX.XX.245ššššš4 ušššš7ššš64šššš7šššš0.605ššš-1.202
> > > > > > 358.395
> > > > > > *XX.XX.XX.1ššššššXX.XX.XX.245ššššš4 ušššš5ššš64ššš17šššš0.615šš-328.42
> > > > > > 181.405
> > > > > > *XX.XX.XX.1ššššššXX.XX.XX.245ššššš4 ušššš3ššš64ššš37šššš0.615šš-328.42
> > > > > > 214.868
> > > > > > *XX.XX.XX.1ššššššXX.XX.XX.245ššššš4 uššš67ššš64ššš37šššš0.615šš-328.42
> > > > > > 214.868
> > > > > > *XX.XX.XX.1ššššššXX.XX.XX.245ššššš4 uššš63ššš64ššš77šššš0.615šš-328.42
> > > > > > 268.618
> > > > > > *XX.XX.XX.1ššššššXX.XX.XX.245ššššš4 uššš60ššš64šš177šššš0.615šš-328.42
> > > > > > 333.175
> > > > > > šXX.XX.XX.1šššššš.STEP.šššššššššš16 u 1910ššš64šššš0šššš0.000šššš0.000
> > > > > > 0.000
> > > > > > šXX.XX.XX.1ššššššXX.XX.XX.245ššššš4 uššš27ššš64šššš1šššš0.703šš-262.63
> > > > > > 0.004
> > > > > > šXX.XX.XX.1ššššššXX.XX.XX.245ššššš4 uššš31ššš64šššš1šššš0.649šš-331.43
> > > > > > 68.800
> > > > > > ---
> > > > > > 
> > > > > > At the same time host's results are very stable:
> > > > > > ---
> > > > > > bhyve-host% for t in `jot 10`; do ntpq -pn; sleep 64; done
> > > > > > šššššremotešššššššššššrefidššššššst t when poll reachšššdelayšššoffset
> > > > > > jitter
> > > > > > ==============================================================================
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > *XX.XX.XX.1ššššššXX.XX.XX.245ššššš4 ušššš1ššš64šššš1šššš0.401šššš0.176
> > > > > > 0.106
> > > > > > *XX.XX.XX.1ššššššXX.XX.XX.245ššššš4 ušššš6ššš64šššš3šššš0.401šššš0.176
> > > > > > 0.459
> > > > > > *XX.XX.XX.1ššššššXX.XX.XX.245ššššš4 ušššš3ššš64šššš7šššš0.401šššš0.176
> > > > > > 0.940
> > > > > > *XX.XX.XX.1ššššššXX.XX.XX.245ššššš4 uššš67ššš64šššš7šššš0.401šššš0.176
> > > > > > 0.940
> > > > > > *XX.XX.XX.1ššššššXX.XX.XX.245ššššš4 uššš64ššš64ššš17šššš0.401šššš0.176
> > > > > > 1.566
> > > > > > *XX.XX.XX.1ššššššXX.XX.XX.245ššššš4 uššš60ššš64ššš37šššš0.448šššš1.275
> > > > > > 1.739
> > > > > > *XX.XX.XX.1ššššššXX.XX.XX.245ššššš4 uššš55ššš64ššš77šššš0.448šššš1.275
> > > > > > 2.365
> > > > > > *XX.XX.XX.1ššššššXX.XX.XX.245ššššš4 uššš53ššš64šš177šššš0.448šššš1.275
> > > > > > 3.110
> > > > > > *XX.XX.XX.1ššššššXX.XX.XX.245ššššš4 uššš50ššš64šš377šššš0.448šššš1.275
> > > > > > 3.929
> > > > > > *XX.XX.XX.1ššššššXX.XX.XX.245ššššš4 uššš45ššš64šš377šššš0.443šššš8.750
> > > > > > 4.722
> > > > > > ---
> > > > > > 
> > > > > > The network is organized via bridge -- host igb and vm tap interfaces
> > > > > > are members of one bridge.
> > > > > > 
> > > > > > Are those results expected? Does it smell like a bug? Should I dig
> > > > > > furter?
> > > > > > 
> > > > > So it is repeatedly stepping the clock in the VM? (Set
> > > > > kern.timecounter.stepwarnings=1 to log steps).
> > > > No kernel/ntpd messages for 20 minutes after setting this sysctl.
> > > > 
> > > > > 
> > > > > 
> > > > > šThat is usually a sign
> > > > > that the chosen timecounter is running at a different frequency than it
> > > > > claimed to be when it registered itself -- the host may not be
> > > > > emulating the timer hardware properly in the guest. šWhat is the output
> > > > > of sysctl kern.timecounter in the vm?
> > > > ---
> > > > bhyve-vm% sysctl kern.timecounter
> > > > 
> > > > kern.timecounter.tsc_shift: 1
> > > > kern.timecounter.smp_tsc_adjust: 0
> > > > kern.timecounter.smp_tsc: 0
> > > > kern.timecounter.invariant_tsc: 1
> > > > kern.timecounter.fast_gettime: 1
> > > > kern.timecounter.tick: 1
> > > > kern.timecounter.choice: ACPI-fast(900) HPET(950) i8254(0) TSC-low(-100)
> > > > dummy(-1000000)
> > > > kern.timecounter.hardware: HPET
> > > > kern.timecounter.alloweddeviation: 5
> > > > kern.timecounter.stepwarnings: 1
> > > > kern.timecounter.tc.ACPI-fast.quality: 900
> > > > kern.timecounter.tc.ACPI-fast.frequency: 3579545
> > > > kern.timecounter.tc.ACPI-fast.counter: 4161213491
> > > > kern.timecounter.tc.ACPI-fast.mask: 4294967295
> > > > kern.timecounter.tc.HPET.quality: 950
> > > > kern.timecounter.tc.HPET.frequency: 10000000
> > > > kern.timecounter.tc.HPET.counter: 3518036865
> > > > kern.timecounter.tc.HPET.mask: 4294967295
> > > > kern.timecounter.tc.i8254.quality: 0
> > > > kern.timecounter.tc.i8254.frequency: 1193182
> > > > kern.timecounter.tc.i8254.counter: 47597
> > > > kern.timecounter.tc.i8254.mask: 65535
> > > > kern.timecounter.tc.TSC-low.quality: -100
> > > > kern.timecounter.tc.TSC-low.frequency: 1199886114
> > > > kern.timecounter.tc.TSC-low.counter: 1274338278
> > > > kern.timecounter.tc.TSC-low.mask: 4294967295
> > > > ---
> > > BTW, here is the host kern.timecounter:
> > > ---
> > > kern.timecounter.tsc_shift: 1
> > > kern.timecounter.smp_tsc_adjust: 0
> > > kern.timecounter.smp_tsc: 1
> > > kern.timecounter.invariant_tsc: 1
> > > kern.timecounter.fast_gettime: 1
> > > kern.timecounter.tick: 1
> > > kern.timecounter.choice: ACPI-fast(900) HPET(950) i8254(0) TSC-low(1000)
> > > dummy(-1000000)
> > > kern.timecounter.hardware: TSC-low
> > > kern.timecounter.alloweddeviation: 5
> > > kern.timecounter.stepwarnings: 0
> > > kern.timecounter.tc.ACPI-fast.quality: 900
> > > kern.timecounter.tc.ACPI-fast.frequency: 3579545
> > > kern.timecounter.tc.ACPI-fast.counter: 9047194
> > > kern.timecounter.tc.ACPI-fast.mask: 16777215
> > > kern.timecounter.tc.HPET.quality: 950
> > > kern.timecounter.tc.HPET.frequency: 14318180
> > > kern.timecounter.tc.HPET.counter: 2232552795
> > > kern.timecounter.tc.HPET.mask: 4294967295
> > > kern.timecounter.tc.i8254.quality: 0
> > > kern.timecounter.tc.i8254.frequency: 1193182
> > > kern.timecounter.tc.i8254.counter: 43410
> > > kern.timecounter.tc.i8254.mask: 65535
> > > kern.timecounter.tc.TSC-low.quality: 1000
> > > kern.timecounter.tc.TSC-low.frequency: 1200067168
> > > kern.timecounter.tc.TSC-low.counter: 2463146362
> > > kern.timecounter.tc.TSC-low.mask: 4294967295
> > > ---
> > > 
> > > > 
> > > > 
> > > > > 
> > > > > 
> > > > > Also, what is the output of ntptime(8) in the vm?
> > > > ---
> > > > bhyve-vm% ntptime
> > > > 
> > > > ntp_gettime() returns code 0 (OK)
> > > > š time dd94930f.20ea2900ššFri, Oct 20 2017 18:21:51.128, (.128573699),
> > > > š maximum error 1309110 us, estimated error 3 us, TAI offset 37
> > > > ntp_adjtime() returns code 0 (OK)
> > > > š modes 0x0 (),
> > > > š offset 0.000 us, frequency 0.000 ppm, interval 1 s,
> > > > š maximum error 1309110 us, estimated error 3 us,
> > > > š status 0x2001 (PLL,NANO),
> > > > š time constant 6, precision 0.001 us, tolerance 496 ppm,
> > > > ---
> > > > 
> > > > Ian, thank you for your help!
> > > > 
> > It seems odd to me that the frequency of the host HPET is 14.3mhz and
> > that of the guest is 10.0mhz, but maybe that's a normal condition for
> > bhyve. šI did find some google hits[1] for bhyve guest timekeeping
> > trouble with the HPET timer which was solved by switching to a
> > different timecounter. šTimecounter choices can't be controlled from
> > loader.conf, so I guess a sysctl.conf entry of
> > kern.timecounter.hardware="ACPI-fast" is the only way to fix that. šYou
> > can also just do that command interactively first and see if it stops
> > the time steps and ntp settles down.
> The process seems to become more monotonic. But steps nevertheless:
> ---
> *XX.1ššššššXX.245ššššš4 uššš20ššš64šššš1šššš0.717šš-12.771ššš4.193
> *XX.1ššššššXX.245ššššš4 uššš28ššš64šššš3šššš0.751šš-41.970šš32.342
> *XX.1ššššššXX.245ššššš4 uššš23ššš64šššš7šššš0.748šš-59.505šš46.624
> *XX.1ššššššXX.245ššššš4 uššš18ššš64ššš17šššš0.699šš-75.164šš56.482
> *XX.1ššššššXX.245ššššš4 uššš14ššš64ššš37šššš0.669šš-90.112šš63.767
> *XX.1ššššššXX.245ššššš4 uššš11ššš64ššš77šššš0.605šš-10.567šš60.914
> *XX.1ššššššXX.245ššššš4 ušššš7ššš64šš177šššš0.591šš-169.54 116.762
> *XX.1ššššššXX.245ššššš4 ušššš3ššš64šš377šššš0.591šš-169.54 102.107
> *XX.1ššššššXX.245ššššš4 uššš68ššš64šš377šššš0.591šš-169.54 102.107
> *XX.1ššššššXX.245ššššš4 uššš63ššš64šš377šššš0.591šš-169.54šš88.424
> *XX.1ššššššXX.245ššššš4 uššš58ššš64šš377šššš0.591šš-169.54šš92.949
> *XX.1ššššššXX.245ššššš4 uššš55ššš64šš377šššš0.591šš-169.54 111.512
> *XX.1ššššššXX.245ššššš4 uššš50ššš64šš377šššš0.591šš-169.54 140.827
> *XX.1ššššššXX.245ššššš4 uššš45ššš64šš377šššš0.591šš-169.54 177.360
> *XX.1ššššššXX.245ššššš4 uššš43ššš64šš377šššš0.591šš-169.54 219.057
> šXX.1šššššš.STEP.šššš16 ušš588ššš64šššš0šššš0.000šššš0.000ššš0.000
> ---
> 
> > 
> > This would be a workaround, not a fix per se. šIf the time steps go
> > away, then something in bhyve's emulation of HPET (maybe only on some
> > hardware?) must be buggy.
> > 
> > 
> > [1] https://lists.freebsd.org/pipermail/freebsd-virtualization/2015-April/003492.html
> Also thanks for the link. Unfortunately the problem seems to persist.
> 

Hmm, that almost looks normal... it noticed the clock was drifting
fast, so it went into a frequency-training cycle (at the point where
the offset stopped changing at -169.54) and then once it had figured
out the frequency it did a step to get realigned, and (presumably)
adjusted the frequency of the kernel clock. šThe output of ntptime
before and after the step would show that... before the step the
frequency would show as zero (which was the case in your ntptime output
earlier), and after the step it would be non-zero. šNo more steps would
occur after the first one, if that's what is happening here.

-- Ian
Received on Fri Oct 20 2017 - 16:04:33 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:13 UTC