On Thursday, February 05, 2015 08:48:33 AM Luigi Rizzo wrote: > On Thursday, February 5, 2015, Peter Wemm <peter_at_wemm.org> wrote: > > On Wednesday, February 04, 2015 04:29:41 PM Konstantin Belousov wrote: > > > On Tue, Feb 03, 2015 at 01:33:15PM -0800, Peter Wemm wrote: > > > > Sometime in the Dec 10th through Jan 7th timeframe a timing bug has > > > > been > > > > > > introduced to 11.x/head/-current. With HZ=1000 (the default for > > > > bare > > > > metal, not for a vm); the clocks stop just after 24 days of uptime. > > > > This > > > > > > means things like cron, sleep, timeouts etc stop working. TCP/IP > > > > won't > > > > time out or retransmit, etc etc. It can get ugly. > > > > > > > > The problem is NOT in 10.x/-stable. > > > > > > > > We hit this in the freebsd.org cluster, the builds that we used are: > > > > FreeBSD 11.0-CURRENT #0 r275684: Wed Dec 10 20:38:43 UTC 2014 - fine > > > > FreeBSD 11.0-CURRENT #0 r276779: Wed Jan 7 18:47:09 UTC 2015 - broken > > > > > > > > If you are running -current in a situation where it'll accumulate > > > > uptime, > > > > > > you may want to take precautions. A reboot prior to 24 days uptime > > > > (as > > > > horrible a workaround as that is) will avoid it. > > > > > > > > Yes, this is being worked on. > > > > > > So the issue is reproducable in 3 minutes after boot with the following > > > change in kern_clock.c: > > > volatile int ticks = INT_MAX - (/*hz*/1000 * 3 * 60); > > > > > > It is fixed (in the proper meaning of the word, not like worked around, > > > covered by paper) by the patch at the end of the mail. > > > > > > We already have a story trying to enable much less ambitious option > > > -fno-strict-overflow, see r259045 and the revert in r259422. I do not > > > see other way than try one more time. Too many places in kernel > > > depend on the correctly wrapping 2-complement arithmetic, among others > > > are callweel and scheduler. > > Rather than depending on a compiler option, wouldn't it be better/more > robust to change ticks to unsigned, which has specified wrapping behavior? Yes, but non-trivial. It's also not limited to ticks. Since the compiler knows when it would apply these optimizations, it would be nice if it could warn instead (GCC apparently has a warning, but clang does not). Having people do a manual audit of every signed integer expression in the tree will take a long time. -- John BaldwinReceived on Thu Feb 05 2015 - 12:59:07 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:55 UTC