Re: ntpd dies nightly on a server with jails

From: O. Hartmann <ohartmann_at_walstatt.org>
Date: Thu, 16 Mar 2017 18:13:34 +0100
Am Wed, 15 Mar 2017 13:12:37 -0700
Cy Schubert <Cy.Schubert_at_komquats.com> schrieb:


Thank you very much for responding.

> Hi O.Hartmann,
> 
> I'll try to answer as much as I can in the noon hour I have left.
> 
> In message <20170315071724.78bb0bdc_at_freyja.zeit4.iv.bundesimmobilien.de>, 
> "O. H
> artmann" writes:
> > Running a host with several jails on recent CURRENT (12.0-CURRENT #8 r315187:
> > Sun Mar 12 11:22:38 CET 2017 amd64) makes me trouble on a daily basis.
> > 
> > The box is an older two-socket Fujitsu server equipted with two four-core
> > Intel(R) Xeon(R) CPU L5420  _at_ 2.50GHz.
> > 
> > The box has several jails, each jail does NOT run service ntpd. Each jail has
> > its dedicated loopback, lo1 throughout lo5 (for the moment) with dedicated IP
> > :
> > 127.0.1.1 - 127.0.5.1 (if this matter, I believe not).
> > 
> > The host itself has two main NICs, broadcom based. bcm0 is dedicated to the
> > host, bcm1 is shared amongst the jails: each jail has an IP bound to bcm1 via
> > whihc the jails communicate with the network.
> > 
> > I try to capture log informations via syslog, but FreeBSD's ntpd seems to be
> > very, very sparse with such informations, coverging to null - I can't see
> > anything suiatble in the logs why NTPD dies almost every night leaving the
> > system with a wild reset of time. Sometimes it is a gain of 6 hours, sometime
> > s
> > it is only half an hour. I leave the box at 16:00 local time usually and take
> > care again at ~ 7 o'clock in the morning local time.  
> 
> We will need to turn on debugging. Unfortunately debug code is not compiled 
> into the binary. We have two options. You can either update 
> src/usr.sbin/ntp/config.h to enable DEBUG or build the port (it's the exact 
> same ntp) with the DEBUG option -- this is probably simpler. Then enable 
> debug with -d and -D. -D increases verbosity. I just committed a debug 
> option to both ntp ports to assist here.

I realised that this wasn't the case when I turned the switch on ntpd simply on - the
output was the same as before. So I feared that I have to recompile with debugging
explicitely switched on ...

> 
> Next question: Do you see any indication of a core dump? I'd be interested 
> in looking at it if possible.

I have, intentionally, switched off core dumping. I will switch that on. But in all
messages being logged and searched for "ntp", I never saw any error resulting in a crash,
but I'll look tomorrow closer.

> 
> > 
> > When the clock is floating that wild, in all cases ntpd isn't running any mor
> > e.
> > I try to restart with options -g and -G to adjust the time quickly at the
> > beginning, which works fine.  
> 
> This is disconcerting. If your clock is floating wildly without ntpd 
> running there are other issues that might be at play here. At most the 
> clock might drift a little, maybe a minute or two a day but not by a lot. 
> Does the drift cause your clocks to run fast or slow?

Today, I switched off ntpd on the jail-bearing host. After an hour or so the gain of the
clock wasn't apart from my DCF77 clock - at least not within the granularity of the
minutes. So I switched on ntpd again. After a while, I checked status via "service ntpd
status", and I would bet off my ass that the result was "is running with PID XXX". The
next minute I did the same, the clock was off by almost half an hour (always behind real
time, never before!) and ntpd wasn't running. A coincidence? I can not tell, I did a
"clear" on the terminal :-( But that was strange.

> 
> > 
> > Apart from possible misconfigurations of the jails (I'm quite new to jails an
> > d
> > their pitfalls), I was wondering what causes ntpd to die. i can't determine
> > exactly the time of its death, so it might be related to diurnal/periodic
> > processes (I use only the most vanilla configurations on periodic, except for
> > checking ZFS's scrubbing enabled).  
> 
> As I'm a little rushed for time, I didn't catch whether the jails 
> themselves were also running ntpd... just thought I'd ask. I don't see how 
> zfs scrubbing or any other periodic scripts could cause this.

The jails do not have ntpd running since all the docs I read tell, that the jail-bearing
host provides the time. So I checked/ double-checked, that they do not have ntpd running.

By mentioning ZFS and scrubbing I was more thinking about time-adjusting periodic jobs
like adjkerntz or friends - if there are any I'm not aware of. I see, it's more confusing.

> 
> > 
> > I'ven't had the chance to check whether the hardware is completely all right,
> > but from a superficial point of view there is no issue with high gain of the
> > internal clock or other hardware issues.  
> 
> It's probably a good idea to check. I don't think that would cause ntpd any 
> gas. I've seen RTC battery messages on my gear which haven't caused ntpd 
> any problem. I have two machines which complain about RTC battery being 
> dead, where in fact I have replaced the batteries and the messages still 
> are displayed at boot. I'm not sure if it's possible for a kernel to damage 
> the RTC. In my case that doesn't cause ntpd any problems. It's probably 
> good to check anyway.

The server hardware in question is quite old, from 2008/09, so it has seen its best days
long ago. I haven't checked so far the battery status, but that is next I do or change
the battery cell pro actively for a fresh one.

My fear is that one of the time servers I try to sync with is compromised and serving
wrong times. But I have no clue on that.

I have my difficulties understanding the logic behind ntp.conf regarding "restrict". It
might be possible that I misconfigured in a very stupid way (due to lack of
understanding) ntpd that way, that it could be set by any outer-world timeserver.

I'll check this tomorrow while in office again.
> 
> > 
> > If there are known issues with jails (the problem occurs since I use those),
> > advice is appreciated.  
> 
> Not that I know of.
> 
> 
I'll check the jails anyway. I was asking since I use on 5 jails lo1 - lo5 with each
having a dedicated loopback IP (127.0.1.1 - 127.0.5.1). And the jail host is reporting
listening on all (cloned) loopback interfaces with UDP4, port 123.

I have another machine in the very same network segment, but without jails. I'll take
the configuration and let that box run a while (it is more recent hardware (Haswell
XEON) and the very same recent CURRENT).

 
Kind regards,

Oliver

-- 
O. Hartmann

Ich widerspreche der Nutzung oder Übermittlung meiner Daten für
Werbezwecke oder für die Markt- oder Meinungsforschung (§ 28 Abs. 4 BDSG).

Received on Thu Mar 16 2017 - 16:13:49 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:10 UTC