Re: New syscons bugs: shutdown -r doesn't execute rc.d sequence and others

From: Bruce Evans <brde_at_optusnet.com.au> Date: Fri, 31 Mar 2017 01:32:03 +1100 (EST) · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:11 UTC

On Thu, 30 Mar 2017, Andrey Chernov wrote:

>> We don't understand the bug yet.  It might not even be in sc.  Do you only
>> see problems for shutdown?  The shutdown environment is special for
>> locking.
>
> Yes, only for reboot/shutdown. The system does not do anythings wrong
> even under high load. On reboot or hang those lines are never printed:
>
> kernel: Waiting (max 60 seconds) for system process `vnlru' to stop...done
> kernel: Waiting (max 60 seconds) for system process `bufdaemon' to
> stop...done
> kernel: Waiting (max 60 seconds) for system process `syncer' to stop...
> kernel: Syncing disks, vnodes remaining...5 3 0 1 0 0 done
> kernel: All buffers synced.
> (it is from 10-stable sample, old -current samples are lost)
>
> Moreover, GELI swap deactivation lines are never printed too (I already
> mention that I change swap to normal, but nothing is changed).
>
>> A hang in sc means that deadlock occurred and sc's new deadlock detection
>> didn't work.
>
> Hangs are rare. Most common are premature reboots.
>
>> Check that ddb works before shutdown, or just put a lot of printfs in
>
> I can't check it ddb because I can't enter ddb in sc mode, as I already
> write, nothing happens. Only vt mode allows Ctrl-Alt-ESC, but the bug
> does not exist in vt mode, so it is pointless.

That is signficant.  My changes were initially all about making ddb work
almost perfectly with sc.

ddb is entered by kdb first calling cngrab(), which does much the same
things as cnputc(), but more to set up for using the keyboard.  If the
sc part of cngrab() detects a problem, it should return and then the
sc part of cnputc() should detect the same problem and do emergency output
which might be just to buffer it.

Nothing at all happening looks like a simpler problem, with Ctrl-Alt-ESC
not being recognized.  There are too many ways to enable/disable this
entry, but I didn't change this.

>>>> You might have entered ddb in a context which used to race or deadlock.
>>>
>>> No. I try about 20 times on machine which does nothing and can't enter
>>> KDB in sc only mode, but got one dead hang instead, when start to repeat
>>> it too fast.
>>
>> Even earlier than shutdown, and when booting?
>
> I mean in normal operation mode after booting, earlier than shutdown.
> Shutdown with premature reboot is too fast to press anything at the
> right time. I don't try to enter ddb when booting yet, but tell you
> results later.

Look early in kern_reboot(), where it does print_uptime() then cngrab().
Console output before this cngrab() should work normally, and I suspect
that something in cngrab() reboots.  But syncing the file systems is
done before this.  I think they are unmounted later, so are fscked but
don't need more than fsck -p if they have been synced.

Bruce