On Thu, 12 Apr 2012, Luigi Rizzo wrote: > On Thu, Apr 12, 2012 at 01:18:59PM +1000, Bruce Evans wrote: >> On Wed, 11 Apr 2012, Luigi Rizzo wrote: >> >>> On Wed, Apr 11, 2012 at 02:16:49PM +0200, Andre Oppermann wrote: > ... >>> ping takes a timestamp in userspace before trying to transmit >>> the packet, and then the timestamp for the received packet >>> is recorded in the kernel (in the interrupt or netisr thread >>> i believe -- anyways, not in userspace). >> >> No, all timestamps recorded used by ping are recorded in userland. > > Bruce, look at the code in ping.c -- SO_TIMESTAMP is defined, > so the program does (successfully) a > > setsockopt(s, SOL_SOCKET, SO_TIMESTAMP, &on, sizeof(on)); > > and then (verified that at runtime the code follows this path) > ... Indeed it does. This accounts for the previously unaccounted for binuptime call in the kernel. This is part of fenner's change (perhaps the most important part) to put the timestamping closer to the actual i/o, so that the RTT doesn't count setup overheads. Timestamping still works when SO_TIMESTAMP is undefed (after fixing the ifdefs on it). Not using it costs only 1 more gettimeofday() call in ping, but it increases my apparent RTT from 7-8 usec to 10-11 usec. 3 extra usec seems a lot for the overhead. I found that ping -f saves 1 gettimeofday() call per packet, and my version saves another 1. -current ping -q localhost: 4 my ping -q localhost: 3 -current ping -fq localhost: 3 my ping -fq localhost: 2 1 gettimeofday() call is needed for putting a timestamp in the output packet. Apparently, this is not well integrated with the bookkeeping for select(), and up to 3 more gettimeofday() calls are used. select() timeouts only have 1/HZ granulatrity, so the gettimeofday() calls to set them up could use clock_gettime() with CLOCK_REALTIME_FAST_N_BROKEN, but this would be a bogus optimization since most of the overhead is in the syscall provided the timecounter hardware is not slow (it takes 9-80 cycles to read TSC timecounter hardware and 600-800 cycles for clock_gettime() with a TSC timecounter). I just noticed Note that CLOCK_MONOTONIC cannot be used for the timestamp in the output packet if SO_TIMESTAMP is used for the timestamp in the input packet, since SO_TIMESTAMP uses CLOCK_REALTIME for historical reasons. There are 2 select()s per packet. Perhaps the number of gettimeofday()s can be reduced to 1 per packet in most cases (get one for the output packet and use it for both select()s). With my version the truss trace for ping localhost is: % 64 bytes from 127.0.0.1: icmp_seq=6 ttl=64 time=0.473 ms % write(1,0x80d4000,57) = 57 (0x39) % gettimeofday({1334226305 409249},0x0) = 0 (0x0) Need this for the next select(). There is a ~1 second pause here. % select(4,{3},0x0,0x0,{0 989607}) = 0 (0x0) Truss doesn't show this until select() returns ~1 second later. The gettimeofday() call was needed because we don't simply use a 1 second pause, but adjust for overheads. My version uses a fancier adjustment. % gettimeofday({1334226306 408632},0x0) = 0 (0x0) We need this accurately to put in the packet. But all the other timestamps can be derived from this for the non-flood case. We can just try to send every `interval' and adjust the timeouts a bit when we see that we get here a bit early or late, and when we get here more than a bit early or late we can either recalibrate or adjust by more. Note that this gettimeofday() returned 1.0 - 0.000617 seconds after the previous one, although we asked for a timeout of ~1/HZ = 0.0010000 seconds. Select timeouts have a large granularity and we expect errors of O(1/HZ) and mostly compensate for them. The drift without compensation would be 1% with HZ = 100 and -i 1.0, and much larger with -i <small>. My version compensates more accurately than -current. % sendto(0x3,0x80b8d34,0,0x0,{ AF_INET 127.0.0.1:0 },0x10) = 64 (0x40) % gettimeofday({1334226306 408831},0x0) = 0 (0x0) % select(4,{3},0x0,0x0,{0 990025}) = 1 (0x1) % recvmsg(0x3,0xbfbfe8c0,0x0) = 84 (0x54) % gettimeofday({1334226306 409104},0x0) = 0 (0x0) I don't know what this is for. We got a timestamp in the returned packet, and use it. Since this timestamp has microsecond resolution and select() only has 1/Hz resolution. this timestamp should be more than good enough for sleeping for the interval. % 64 bytes from 127.0.0.1: icmp_seq=7 ttl=64 time=0.472 ms % write(1,0x80d4000,57) = 57 (0x39) Next packet: % gettimeofday({1334226306 409279},0x0) = 0 (0x0) % ... But select() timeouts are not needed at all. Versions before fenner's changes used a select() timeout for flood pings. alarm() was used to generate other timeouts. The alarm() code was not very good. It did a lot of work in the signal handler to set up the next alarm (1 call to signal() and 1 call to alarm()), and it did unsafe work in the signal handler. It was not careful about drift. These overheads and the drift can can be avoided using a periodic itimer. Ideally, expiry of each period would just cause select() to return, but AFAIK there is no way to make select() return without catching the signal, and even a null signal catcher has a lot of overhead. It is unclear if the overhead for a null signal catcher is larger than that for select timeouts. The current overhead for select timeouts is 2 or 3 gettimeofday() calls in ping and 1 or 2 getmicrouptime() calls in each select(). These take at least 1 usec on my Athlon64 2GHz UP, and signal handling also takes about 1 usec in old versions of FreeBSD, but now takes more like 1.5 usec. SO_TIMESTAMP seems to have been invented in 1996 by fenner to make ping work better. (The log messages seem to only mention fixing it but I can't find it in earlier code.) There seems to be no way to timestamp outgoing packets properly in the kernel. ICMP_TIMESTAMP might give milliseconds resolution for ping, but FreeBSD ping has given microseconds resulotion since 1993. BruceReceived on Thu Apr 12 2012 - 09:22:17 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:25 UTC