Re: [rfc] [patch] do not stop watchdog on shutdown

From: Maksim Yevmenkin <maksim.yevmenkin_at_gmail.com>
Date: Tue, 17 Dec 2013 14:15:01 -0800
On Tue, Dec 17, 2013 at 2:00 PM, Andriy Gapon <avg_at_freebsd.org> wrote:
> on 17/12/2013 20:53 Maksim Yevmenkin said the following:
>> hello,
>>
>> would anyone object to this patch?
>>
>> max
>>
>> Index: src/etc/rc.d/watchdogd
>> ===================================================================
>> --- src/etc/rc.d/watchdogd      (revision 2999)
>> +++ src/etc/rc.d/watchdogd      (working copy)
>> _at__at_ -39,4 +39,7 _at__at_
>>  pidfile="/var/run/${name}.pid"
>>
>>  load_rc_config $name
>> +
>> +sig_stop="${watchdogd_sig_stop:-TERM}"
>> +
>>  run_rc_command "$1"
>
> I wonder if anyone could object to this rather generic (and NOP by default) change.
> I see your intent, but a few words about it would not hurt :-)

well, when watchdogd is asked to exit nicely (via SIGTERM) it will
stop timer. since watchdogd rc.d script is marked as 'shutdown' it
will exit (on shutdown) and stop timer. if system happens to hung
after this, manual reset is required. when one operates in
"lights-out" type of environments and without readily available
"remote hands" it could create a problem.

default behavior is preserved, i.e. watchdogd will still be killed via
SIGTERM and timer will be stopped. in order to activate new feature,
one needs to put

watchdogd_sig_stop="KILL"

into /etc/rc.conf and also make sure watchdogd timeout is set to long
enough value make sure system comes back online before timeout fires.

> BTW, for a while now we have some support for interacting with the watchdog(9)
> from within the kernel.  I have the following local patch / hack that makes use
> of that support:
>
> commit b64c5e855420f2d905a04f69fad5de116e8ffae5
> Author: Andriy Gapon <avg_at_icyb.net.ua>
> Date:   Fri Nov 25 10:00:59 2011 +0200
>
>     [test] arm the watchdog before going into the final shutdown/reboot step
>
>     ... to preclude hanging on that step.
>     Note: halt assumes the limbo, so no watchdog for that case.
>
> diff --git a/sys/kern/kern_shutdown.c b/sys/kern/kern_shutdown.c
> index eaa78b8e..88afaa9 100644
> --- a/sys/kern/kern_shutdown.c
> +++ b/sys/kern/kern_shutdown.c
> _at__at_ -444,6 +444,11 _at__at_ kern_reboot(int howto)
>         if ((howto & (RB_HALT|RB_DUMP)) == RB_DUMP && !cold && !dumping)
>                 doadump(TRUE);
>
> +       if ((howto & RB_HALT) != 0)
> +               wdog_kern_pat(0);
> +       else
> +               wdog_kern_pat(WD_TO_32SEC + 1);
> +
>         /* Now that we're going to really halt the system... */
>         EVENTHANDLER_INVOKE(shutdown_final, howto);
>
>
> Admittedly, there is a gap between userland watchdog being stopped and kernel
> watchdog taking over.  I wish that we had 'proper' integration between them,
> with proper hand-off, etc.

fixed timeout of 32 sec (if i'm understanding this correctly) might
not be enough for all usage cases. its definitely not enough in for
our usage case. at the very least timeout value should be configurable
to be useful in our case.

thanks,
max
Received on Tue Dec 17 2013 - 21:15:03 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:45 UTC