On Mon, 2019-09-09 at 21:44 +0300, Konstantin Belousov wrote: > On Mon, Sep 09, 2019 at 12:13:24PM -0600, Ian Lepore wrote: > > On Mon, 2019-09-09 at 09:30 -0700, Rodney W. Grimes wrote: > > > > On Sat, 2019-09-07 at 09:28 -0700, Cy Schubert wrote: > > > > > In message <20190907161749.GJ2559_at_kib.kiev.ua>, Konstantin > > > > > Belousov writes: > > > > > > On Sat, Sep 07, 2019 at 08:45:21AM -0700, Cy Schubert > > > > > > wrote: > > > > > > > [...] > > > > > > Doesn't locking this memory down also protect ntpd from OOM kills? > > > If so that is a MUST preserve functionality, as IMHO killing ntpd > > > on a box that has it configured is a total no win situation. > > > > > > > Does it have that effect? I don't know. But I would argue that that's > > a separate issue, and we should make that happen by adding > > ntpd_oomprotect=YES to /etc/defaults/rc.conf > > Wiring process memory has no effect on OOM selection. More, because > all potentially allocated pages are allocated for real after mlockall(), > the size of the vmspace, as accounted by OOM, is the largest possible > size from the whole lifetime. > > On the other hand, the code execution times are not predictable if the > process's pages can be paged out. Under severe load next instruction > might take several seconds or even minutes to start. It is quite unlike > the scheduler delays. That introduces a jitter in the local time > measurements and their usage as done in userspace. Wouldn't this affect > the accuracy ? > IMO, there is a large gap between "in theory, paging could cause indeterminate delays in code execution" and "time will be inaccurate on your system". If there were a delay in a part of the code where it matters that amounted to "seconds or even minutes", what you'd end up with is a measurement that would be discarded by the median filter as an outlier. There would be some danger that if that kind of delay happened for too many polling cycles in a row, you'd end up with no usable measurements after a while and clock accuracy would suffer. Sub-second delays would be more worriesome because they might not be rejected as outliers. There are only a couple code paths in freebsd ntpd processing where a paging (or scheduling) delay could cause measurement inaccuracy: - When stepping the clock, the code that runs between calling clock_gettime() and calling clock_settime() to apply the step adjustment to the clock. - When beginning an exchange with or replying to a peer, the code that runs between obtaining system time for the outgoing Transmit Timestamp and actually transmitting that packet. Stepping the clock typically only happens once at startup. The ntpd code itself recognizes that this is a time-critical path (it has comments to that effect) but unfortunately the code that runs is scattered among several different .c files so it's hard to say what the likelyhood is that code in the critical section will all be in the same page (or be already-resident because other startup-time code faulted in those pages). IMO, the right fix for this would be a kernel interface that let you apply a step-delta to the clock with a single syscall (perhaps as an extension to the existing ntp_adjtime() using a new mode flag). On freebsd, the Receive timestamps are captured in the kernel and delivered along with the packet to userland, and are retrieved by the ntpd code from the SCM_BINTIME control message in the packet, so there is no latency problem in the receive path. There isn't a corresponding kernel mechanism for setting the outgoing timestamps, so whether it's originating a request to a peer or replying to a request from a peer, the transmit timestamp could be wrong due to: - paging delays - scheduler delays - network stack, outgoing queues, and driver delays So the primary vulnerability is on the transmit path between obtaining system time and the packet leaving the system. A quick glance at that code makes me think that most of the data being touched has already been referenced pretty recently during the process of assembling the outgoing packet, so it's unlikely that storing the timestamp into the outgoing packet or the other bit of work that happens after that triggers a pagein unless the system is pathologically overloaded. Naturally, obtaining the timestamp and putting it into the packet is one of the last things it does before sending, so the code path is relatively short, but it's not clear to me whether it's likely or not that the code involved all lives in the same page. Still, it's one of the heavily exercised paths within ntpd, which should increase the odds of the pages being resident because of recent use. So, I'm not disputing the point that a sufficiently overloaded system can lead to an indeterminate delay between *any* two instructions executed in userland. What I've said above is more along the lines of considering the usual situation, not the most pathlogical one. In the most pathological cases, either the delays introduced are fairly minor and you get some minor jitter in system time (ameliorated by the median filtering built in to ntpd), or the delays are major (a full second or more) and get rejected as outliers, not affecting system time at all unless the situation persists and prevents getting any good measurements for many hours. -- IanReceived on Mon Sep 09 2019 - 19:12:24 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:21 UTC