Re: ntpd dies nightly on a server with jails

From: Ian Lepore <ian_at_freebsd.org>
Date: Fri, 17 Mar 2017 14:33:13 -0600
On Fri, 2017-03-17 at 13:26 -0700, Don Lewis wrote:
> On 17 Mar, O. Hartmann wrote:
> 
> > 
> > Just some strange news:
> > 
> > I left the server the whole day with ntpd disabled and I didn't
> > watch
> > a gain of the RTC by one second, even stressing the machine.
> > 
> > But soon after restarting ntpd, I realised immediately a 30 minutes
> > off! This morning, the discrapancy was almost 5 hours - it looked
> > more
> > like a weird ajustment to another time base than UTC.
> > 
> > Over the weekend I'll leave the server with ntpd disabled and only
> > RTC
> > running. I've the strange feeling that something is intentionally
> > readjusting the ntpd time due to a misconfiguration or a rogue ntp
> > server in the X.CC.pool.ntp.org
> A ntp should recognize a single bad server and ignore it in favor of 
> the other servers that are sane.
> 
> It sounds like something is going off the rails once ntpd starts
> calling
> adjtime().  What is the output of:
> 	sysctl kern.clockrate
> 
> I'd suggest starting ntpd and running "ntpq -c pe" a few times a
> minute
> and capturing its output to monitor the status of ntpd as it starts
> up
> and try to capture things going wrong.   You should probably disable
> iburst in ntp.conf to give more visibility in the early startup.
> 
> For the first few minutes ntpd should just be getting reliable
> timestamp
> info and won't start trying to adjust the clock until it has captured
> endough samples and figured out which servers are best.  Then the
> behaviour of the offset is the thing to watch.  If the iniital offset
> is
> large enough, ntpd will step the clock once to get it close to zero,
> otherwise it will just use adjtime to slowy push the offset towards
> zero.  I think though that you will see the offset start gyrating
> madly.
> 
> You might want to set /var/db/ntpd.drift to zero beforehand if there
> is
> an insane value in there.  If the initial drift value is bogus, will
> try
> to use it which will push the time offset away from zero so fast that
> it
> will decide to keep stepping the clock back to zero before it can
> capture enough samples from the external servers to determine the
> true
> local clock drift rate.

Do not set ntpd.drift contents to zero.  Delete the file.  There's a
huge difference between a file that says the clock is perfect and a
missing file which triggers ntpd to do a 15-minute frequency
measurement to come up with the initial drift correction.

-- Ian
Received on Fri Mar 17 2017 - 19:33:28 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:10 UTC