Upgrading 10.2-RELEASE-p6 to 10.2-RELEASE-p7 now solved ntpd crashes (apparently fixed by: FreeBSD Errata Notice FreeBSD-EN-15:20.vm). Thanks!!! Mark On 2015-11-01 10:31, Andre Albsmeier wrote: > On Fri, 30-Oct-2015 at 19:47:59 +0100, Mark Martinec wrote: >> Not sure if it's the same issue, but it sure looks like it is. >> >> I have upgraded a couple of hosts (amd64) from 10.2-RELEASE-p5 >> to 10.2-RELEASE-p6, i.e. the freebsd-upgrade essentially just >> replaced the /usr/sbin/ntpd with a new one; then I restarted >> the ntpd. >> >> On all host but one this was successful: the new ntpd starts >> fine and works normally. But on one of these machines the >> ntpd process immediately crashes with SIGSEGV. That machine >> has an Intel Xeon cpu. It is not apparent to me in what way >> this machine differs from others, > > I'll add my observations here: > > I am using an ntp.conf with a single server entry: > > server ntp.some.domain.org > > ntp.some.domain.org is a CNAME pointing to gate.some.domain.org > and the latter contains an A record pointing to 192.168.128.1. > > After updating 9.3-STABLE to the latest version (one which includes ntp > 4.2.8p4), ntpd crashes: > > Nov 1 09:38:38 voyager kernel: pid 4443 (ntpd), uid 0: exited on signal > 11 > > This happens in line 871 of ntpd.c where mlockall() is called: > > && 0 != mlockall(MCL_CURRENT|MCL_FUTURE)) > > It does NOT crash with MCL_FUTURE only. > It does crash with MCL_CURRENT only. > > When adding > > rlimit memlock -1 > > to ntpd.conf it does NOT crash (as mlockall() won't be called anymore). > > When specifying the IP address (192.168.128.1) as the server it > does NOT crash. > > When specifying gate.some.domain.org as the server it also does > NOT crash. tcpdump shows in this case: > > 09:49:59.542310 IP 192.168.128.2.21102 > 192.168.128.1.53: 7639+ A? > gate.some.domain.org. (41) > 09:49:59.542578 IP 192.168.128.1.53 > 192.168.128.2.21102: 7639* 1/1/0 > A 192.168.128.1 (71) > 09:49:59.542612 IP 192.168.128.2.52455 > 192.168.128.1.53: 42047+ > AAAA? gate.some.domain.org. (41) > 09:49:59.542792 IP 192.168.128.1.53 > 192.168.128.2.52455: 42047* 0/1/0 > (88) > > When reverting the server entry back to ntp.some.domain.org > it crashes and tcpdump shows: > > 09:36:05.172552 IP 192.168.128.2.17836 > 192.168.128.1.53: 49768+ A? > ntp.some.domain.org. (40) > 09:36:05.173320 IP 192.168.128.1.53 > 192.168.128.2.17836: 49768* > 2/1/0 CNAME gate.some.domain.org., A 192.168.128.1 (89) > 09:36:05.173361 IP 192.168.128.2.22611 > 192.168.128.1.53: 63808+ > AAAA? ntp.some.domain.org. (40) > 09:36:05.173595 IP 192.168.128.1.53 > 192.168.128.2.22611: 63808* > 1/1/0 CNAME gate.some.domain.org. (106) > > The probability for crashing increases with the speed and the > number of cores of the machine: On my old single-core Pentiums > it never crashes, on my quad-cores i7-3770K it always crashes. > > The (asynchronous) resolving of the names start in line 3876 of > ntp_config.c: > > getaddrinfo_sometime(curr_peer->addr->address, > > If we put the mlockall() call directly before this line, the > crash is gone. > > Maybe you want to play around with rlimit, CNAMES, IPs and > so on... > > -Andre > > Anyone else seeing this? >> 2015-10-30 12:34, je David Wolfskill napisal >> > On Fri, Oct 30, 2015 at 09:42:07AM +0100, Dag-Erling Smørgrav wrote: >> >> David Wolfskill <david_at_catwhisker.org> writes: >> >> > ... >> >> > bound to 172.17.1.245 -- renewal in 43200 seconds. >> >> > pid 544 (ntpd), uid 0: exited on signal 11 (core dumped) >> >> > Starting Network: lo0 em0 iwn0 lagg0. >> >> > ... >> >> >> >> Did you find a solution? I'm wondering if the ntpd problems people >> >> are >> >> reporting on freebsd-security_at_ are related. I vaguely recall hearing >> >> that this had been traced to a pthread bug, but can't find anything >> >> about it in commit logs or mailing list archives. >> >> .... >> > >> > I don't recall finding "a solution" per se; that said, I also don't >> > recall seeing an occurrence of the above for enough time that I'm not >> > sure when I sent that message. :-} >> > >> > As a reality check: >> > >> > g1-252(11.0-C)[1] ls -lT /*.core >> > -rw-r--r-- 1 root wheel 13783040 Aug 18 04:19:03 2015 /ntpd.core >> > g1-252(11.0-C)[2] >> > >> > So -- among other points -- my last sighting of whatever was causing >> > that was the day I built: >> > >> > FreeBSD 11.0-CURRENT #157 r286880M/286880:1100079: Tue Aug 18 >> > 04:45:25 PDT 2015 >> > root_at_g1-252.catwhisker.org:/common/S4/obj/usr/src/sys/CANARY amd64 >> > >> > Note that the machines where I run head get updated daily (unless >> > there's enough of a problem with head that I can't build it or can't >> > boot it (and I'm unable to circumvent the issue within a reasonable >> > time)) -- and while I do attempt to run ntpd on the machines, the above >> > failure is more "annoying" than "crippling" in my particular case. >> > >> > And I'm presently running: >> > >> > FreeBSD 11.0-CURRENT #227 r290138M/290138:1100084: Thu Oct 29 >> > 05:12:58 PDT 2015 >> > root_at_g1-252.catwhisker.org:/common/S4/obj/usr/src/sys/CANARY amd64 >> > >> > and building head _at_r290190 as I type. >> > >> > And FWIW, I *suspect* that one of the issues involved (in my case) >> > was a ... lack of determinism ... in events involving getting the >> > (wireless) network connectivity into a usable state as part of the >> > initial transition to multi-user mode. (I only have evidence at >> > the moment of the issue on my laptop; my build machine, which only >> > uses a wired NIC, has no /ntpd.core file. It and my laptop are updated >> > pretty much in lock-step; it runs a completely GENERIC kernel, while >> > the laptop runs a modestly customized one based on GENERIC.) >> > >> > Peace, >> > david >> _______________________________________________ >> freebsd-stable_at_freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-stable >> To unsubscribe, send any mail to >> "freebsd-stable-unsubscribe_at_freebsd.org"Received on Wed Nov 04 2015 - 15:15:14 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:00 UTC