Re: ntpd dies nightly on a server with jails

From: Don Lewis <truckman_at_FreeBSD.org>
Date: Fri, 17 Mar 2017 13:26:56 -0700 (PDT)
On 17 Mar, O. Hartmann wrote:

> Just some strange news:
> 
> I left the server the whole day with ntpd disabled and I didn't watch
> a gain of the RTC by one second, even stressing the machine.
> 
> But soon after restarting ntpd, I realised immediately a 30 minutes
> off! This morning, the discrapancy was almost 5 hours - it looked more
> like a weird ajustment to another time base than UTC.
> 
> Over the weekend I'll leave the server with ntpd disabled and only RTC
> running. I've the strange feeling that something is intentionally
> readjusting the ntpd time due to a misconfiguration or a rogue ntp
> server in the X.CC.pool.ntp.org

A ntp should recognize a single bad server and ignore it in favor of 
the other servers that are sane.

It sounds like something is going off the rails once ntpd starts calling
adjtime().  What is the output of:
	sysctl kern.clockrate

I'd suggest starting ntpd and running "ntpq -c pe" a few times a minute
and capturing its output to monitor the status of ntpd as it starts up
and try to capture things going wrong.   You should probably disable
iburst in ntp.conf to give more visibility in the early startup.

For the first few minutes ntpd should just be getting reliable timestamp
info and won't start trying to adjust the clock until it has captured
endough samples and figured out which servers are best.  Then the
behaviour of the offset is the thing to watch.  If the iniital offset is
large enough, ntpd will step the clock once to get it close to zero,
otherwise it will just use adjtime to slowy push the offset towards
zero.  I think though that you will see the offset start gyrating madly.

You might want to set /var/db/ntpd.drift to zero beforehand if there is
an insane value in there.  If the initial drift value is bogus, will try
to use it which will push the time offset away from zero so fast that it
will decide to keep stepping the clock back to zero before it can
capture enough samples from the external servers to determine the true
local clock drift rate.
Received on Fri Mar 17 2017 - 19:27:05 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:10 UTC