Re: ntpd dies nightly on a server with jails

From: O. Hartmann <ohartmann_at_walstatt.org>
Date: Fri, 17 Mar 2017 18:05:07 +0100
Am Wed, 15 Mar 2017 13:12:37 -0700
Cy Schubert <Cy.Schubert_at_komquats.com> schrieb:

> Hi O.Hartmann,
> 
> I'll try to answer as much as I can in the noon hour I have left.
> 
> In message <20170315071724.78bb0bdc_at_freyja.zeit4.iv.bundesimmobilien.de>, 
> "O. H
> artmann" writes:
> > Running a host with several jails on recent CURRENT (12.0-CURRENT #8 r315187:
> > Sun Mar 12 11:22:38 CET 2017 amd64) makes me trouble on a daily basis.
> > 
> > The box is an older two-socket Fujitsu server equipted with two four-core
> > Intel(R) Xeon(R) CPU L5420  _at_ 2.50GHz.
> > 
> > The box has several jails, each jail does NOT run service ntpd. Each jail has
> > its dedicated loopback, lo1 throughout lo5 (for the moment) with dedicated IP
> > :
> > 127.0.1.1 - 127.0.5.1 (if this matter, I believe not).
> > 
> > The host itself has two main NICs, broadcom based. bcm0 is dedicated to the
> > host, bcm1 is shared amongst the jails: each jail has an IP bound to bcm1 via
> > whihc the jails communicate with the network.
> > 
> > I try to capture log informations via syslog, but FreeBSD's ntpd seems to be
> > very, very sparse with such informations, coverging to null - I can't see
> > anything suiatble in the logs why NTPD dies almost every night leaving the
> > system with a wild reset of time. Sometimes it is a gain of 6 hours, sometime
> > s
> > it is only half an hour. I leave the box at 16:00 local time usually and take
> > care again at ~ 7 o'clock in the morning local time.  
> 
> We will need to turn on debugging. Unfortunately debug code is not compiled 
> into the binary. We have two options. You can either update 
> src/usr.sbin/ntp/config.h to enable DEBUG or build the port (it's the exact 
> same ntp) with the DEBUG option -- this is probably simpler. Then enable 
> debug with -d and -D. -D increases verbosity. I just committed a debug 
> option to both ntp ports to assist here.
> 
> Next question: Do you see any indication of a core dump? I'd be interested 
> in looking at it if possible.
> 
> > 
> > When the clock is floating that wild, in all cases ntpd isn't running any mor
> > e.
> > I try to restart with options -g and -G to adjust the time quickly at the
> > beginning, which works fine.  
> 
> This is disconcerting. If your clock is floating wildly without ntpd 
> running there are other issues that might be at play here. At most the 
> clock might drift a little, maybe a minute or two a day but not by a lot. 
> Does the drift cause your clocks to run fast or slow?
> 
> > 
> > Apart from possible misconfigurations of the jails (I'm quite new to jails an
> > d
> > their pitfalls), I was wondering what causes ntpd to die. i can't determine
> > exactly the time of its death, so it might be related to diurnal/periodic
> > processes (I use only the most vanilla configurations on periodic, except for
> > checking ZFS's scrubbing enabled).  
> 
> As I'm a little rushed for time, I didn't catch whether the jails 
> themselves were also running ntpd... just thought I'd ask. I don't see how 
> zfs scrubbing or any other periodic scripts could cause this.
> 
> > 
> > I'ven't had the chance to check whether the hardware is completely all right,
> > but from a superficial point of view there is no issue with high gain of the
> > internal clock or other hardware issues.  
> 
> It's probably a good idea to check. I don't think that would cause ntpd any 
> gas. I've seen RTC battery messages on my gear which haven't caused ntpd 
> any problem. I have two machines which complain about RTC battery being 
> dead, where in fact I have replaced the batteries and the messages still 
> are displayed at boot. I'm not sure if it's possible for a kernel to damage 
> the RTC. In my case that doesn't cause ntpd any problems. It's probably 
> good to check anyway.
> 
> > 
> > If there are known issues with jails (the problem occurs since I use those),
> > advice is appreciated.  
> 
> Not that I know of.
> 
> 

Just some strange news:

I left the server the whole day with ntpd disabled and I didn't watch a gain of the RTC
by one second, even stressing the machine.

But soon after restarting ntpd, I realised immediately a 30 minutes off! This morning,
the discrapancy was almost 5 hours - it looked more like a weird ajustment to another
time base than UTC.

Over the weekend I'll leave the server with ntpd disabled and only RTC running. I've the
strange feeling that something is intentionally readjusting the ntpd time due to a
misconfiguration or a rogue ntp server in the X.CC.pool.ntp.org

-- 
O. Hartmann

Ich widerspreche der Nutzung oder Übermittlung meiner Daten für
Werbezwecke oder für die Markt- oder Meinungsforschung (§ 28 Abs. 4 BDSG).

Received on Fri Mar 17 2017 - 16:05:25 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:10 UTC