Re: ntpd dies nightly on a server with jails

From: Cy Schubert <Cy.Schubert_at_komquats.com>
Date: Wed, 15 Mar 2017 13:12:37 -0700
Hi O.Hartmann,

I'll try to answer as much as I can in the noon hour I have left.

In message <20170315071724.78bb0bdc_at_freyja.zeit4.iv.bundesimmobilien.de>, 
"O. H
artmann" writes:
> Running a host with several jails on recent CURRENT (12.0-CURRENT #8 r315187:
> Sun Mar 12 11:22:38 CET 2017 amd64) makes me trouble on a daily basis.
> 
> The box is an older two-socket Fujitsu server equipted with two four-core
> Intel(R) Xeon(R) CPU L5420  _at_ 2.50GHz.
> 
> The box has several jails, each jail does NOT run service ntpd. Each jail has
> its dedicated loopback, lo1 throughout lo5 (for the moment) with dedicated IP
> :
> 127.0.1.1 - 127.0.5.1 (if this matter, I believe not).
> 
> The host itself has two main NICs, broadcom based. bcm0 is dedicated to the
> host, bcm1 is shared amongst the jails: each jail has an IP bound to bcm1 via
> whihc the jails communicate with the network.
> 
> I try to capture log informations via syslog, but FreeBSD's ntpd seems to be
> very, very sparse with such informations, coverging to null - I can't see
> anything suiatble in the logs why NTPD dies almost every night leaving the
> system with a wild reset of time. Sometimes it is a gain of 6 hours, sometime
> s
> it is only half an hour. I leave the box at 16:00 local time usually and take
> care again at ~ 7 o'clock in the morning local time.

We will need to turn on debugging. Unfortunately debug code is not compiled 
into the binary. We have two options. You can either update 
src/usr.sbin/ntp/config.h to enable DEBUG or build the port (it's the exact 
same ntp) with the DEBUG option -- this is probably simpler. Then enable 
debug with -d and -D. -D increases verbosity. I just committed a debug 
option to both ntp ports to assist here.

Next question: Do you see any indication of a core dump? I'd be interested 
in looking at it if possible.

> 
> When the clock is floating that wild, in all cases ntpd isn't running any mor
> e.
> I try to restart with options -g and -G to adjust the time quickly at the
> beginning, which works fine.

This is disconcerting. If your clock is floating wildly without ntpd 
running there are other issues that might be at play here. At most the 
clock might drift a little, maybe a minute or two a day but not by a lot. 
Does the drift cause your clocks to run fast or slow?

> 
> Apart from possible misconfigurations of the jails (I'm quite new to jails an
> d
> their pitfalls), I was wondering what causes ntpd to die. i can't determine
> exactly the time of its death, so it might be related to diurnal/periodic
> processes (I use only the most vanilla configurations on periodic, except for
> checking ZFS's scrubbing enabled).

As I'm a little rushed for time, I didn't catch whether the jails 
themselves were also running ntpd... just thought I'd ask. I don't see how 
zfs scrubbing or any other periodic scripts could cause this.

> 
> I'ven't had the chance to check whether the hardware is completely all right,
> but from a superficial point of view there is no issue with high gain of the
> internal clock or other hardware issues.

It's probably a good idea to check. I don't think that would cause ntpd any 
gas. I've seen RTC battery messages on my gear which haven't caused ntpd 
any problem. I have two machines which complain about RTC battery being 
dead, where in fact I have replaced the batteries and the messages still 
are displayed at boot. I'm not sure if it's possible for a kernel to damage 
the RTC. In my case that doesn't cause ntpd any problems. It's probably 
good to check anyway.

> 
> If there are known issues with jails (the problem occurs since I use those),
> advice is appreciated.

Not that I know of.


-- 
Cheers,
Cy Schubert <Cy.Schubert_at_cschubert.com>
FreeBSD UNIX:  <cy_at_FreeBSD.org>   Web:  http://www.FreeBSD.org

	The need of the many outweighs the greed of the few.
Received on Wed Mar 15 2017 - 23:49:36 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:10 UTC