Re: r316958: booting a server takes >10 minutes!

From: Ian Lepore <ian_at_freebsd.org>
Date: Mon, 17 Apr 2017 09:29:36 -0600
On Sun, 2017-04-16 at 21:53 -0700, Maxim Sobolev wrote:
> Well, all this suggests to me that there must be some issue with the client
> syslog code in the libc, so that if syslog daemon hangs or has some
> internal issue that would basically render system mostly unusable. I think
> that might be an interesting project for somebody who has some spare time
> on hands to take syslogd as of (r317033 - 1) and see what can be done to
> improve resilience of the system against such a failure mode.
> 
> -Max
> 

On the sending side, the libc code tries very hard to deliver messages
to the unpriveleged /var/run/log socket; if the datagram send fails due
to buffer space (i.e., due to syslogd not keeping up on the read side),
it will endlessly loop to sleep for 1us then try again until it
succeeds.

On the other hand, for /var/run/logpriv apparently the theory is that
hanging a process with enough privs to use that connection would be
bad.  So it retries just once for errors that are not related to buffer
space, and doesn't retry at all if the error was buffer space (which is
a case of the code not quite matching the nearby comments) then gives
up on syslogd and writes the message directly to the console before
returning.

So yeah, there may be some room for improvement in that logic. :)  I
think it could eventually give up in the non-priv case and maybe try an
extra time or two in the priveleged case.

When we ran into this at $work years ago we just wrote our own work-
alike function to use instead of syslog(3); it retries any kind of
failure no more than 3 times, with a millisecond sleep between each
try.  (Losing logging is bad, but losing the functionality of our app
that's trying to do the logging is even worse.)

-- Ian

> On Sun, Apr 16, 2017 at 5:50 PM, Ben Woods <woodsb02_at_gmail.com>
> wrote:
> 
> > 
> > On 16 April 2017 at 03:24, Larry Rosenman <ler_at_lerctr.org> wrote:
> > 
> > > 
> > > Current SVN seems to have fixed it (via sobomax_at_ syslogd commit).
> > > 
> > 
> > I experienced this issue too, and can confirm that it existing on
> > r316952,
> > but is resolve on r317033.
> > 
> > It was extremely strange. The symptoms I was experiencing were:
> > - lightdm display manager would fail to start
> > - slim display manager would start, but then fail to login to xfce
> > - "service hald restart" and "service dbus restart" would fail
> > - "pkg upgrade hal" would fail
> > 
> > Regards,
> > Ben
> > 
> > --
> > From: Benjamin Woods
> > woodsb02_at_gmail.com
> > _______________________________________________
> > freebsd-current_at_freebsd.org mailing list
> > https://lists.freebsd.org/mailman/listinfo/freebsd-current
> > To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freeb
> > sd.org"
> > 
> > 
> _______________________________________________
> freebsd-current_at_freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd
> .org"
Received on Mon Apr 17 2017 - 13:29:46 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:11 UTC