Re: Best way to print something from the kernel at 1s intervals?

From: Peter Eriksson <pen_at_lysator.liu.se> Date: Fri, 29 Nov 2019 23:10:36 +0100 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:22 UTC

I love dtrace, but I seriously doubt that it could be used it for this. This is the absolutely last code that executes at kernel shutdown/reboot. All other processes are terminated when this is happening...

Basically the code done in kern_reboot() (and stuff it calls) in /usr/src/sys/kern/kern_shutdown.c

This code in kern_reboot():
...
>      EVENTHANDLER_INVOKE(shutdown_pre_sync, howto);
>
>        /*
>         * Now sync filesystems
>         */
>        if (!cold && (howto & RB_NOSYNC) == 0 && once == 0) {
>                once = 1;
>                bufshutdown(show_busybufs);
>        }
>
>        print_uptime();
>
>        cngrab();
>
>        /*
>         * Ok, now do things that assume all filesystem activity has
>         * been completed.
>         */
>        EVENTHANDLER_INVOKE(shutdown_post_sync, howto);
>
>        if ((howto & (RB_HALT|RB_DUMP)) == RB_DUMP && !cold && !dumping)
>                doadump(TRUE);
>
>        /* Now that we're going to really halt the system... */
>        EVENTHANDLER_INVOKE(shutdown_final, howto);

What I’ve seen is that the three different EVENTHANDLER_INVOKE() all can cause a lot of calls that can take a long time. As can bufshutdown(), probably when it calls vfs_unmountall() to unmount all filesystems.

We’re talking like 5-20 minutes (or more). I’ve not really timed it.

I have a feeling it sometimes is due to transactions stored up in the ZFS filesystems that the system tries to “run” so it doesn’t have to do it at mount() time after the reboot. If I reset the machine while it is busy doing this then sometimes the “zfs mount -a” takes very long time at boot instead.

Hmm.. I should be able to use the getnanouptime() call to get a “clock” to look for (it’s used in print_uptime()). As long as the clock isn’t stopped at this time in the shutdown sequence atleast :-)

*Time to write some code and test this* :-)

- Peter

> On 29 Nov 2019, at 22:09, Enji Cooper <yaneurabeya_at_gmail.com> wrote:
> 
> 
>> On Nov 28, 2019, at 12:52, Peter Eriksson <pen_at_lysator.liu.se> wrote:
>> 
>> I’ve been looking into the “kernel looks to be hung at reboot” problem at bit. Adding a lot of printf() calls to the relevant parts it looks like it actually isn’t hung but busy unmounting filesystems (which we have thousands of), flushing disk caches, calling registered callbacks and stuff and sometimes it takes a bit longer than usual - probably due to ZFS having stuff queued up that needs to be written to disk before it finishes…
>> 
>> Anyway, I’d like to propose that we add some kind of counter/printf() calls in that code area so we can see that things are indeed progressing. However, I’d probably prefer not to print _every_ filesystem (or registered callbacks - a lot of those...) - that generates a lot of output (tried that :-) but something like:
>> 
>> Unmounting filesystems:
>> 1234 done
>> (With the "1234 done” updated something like once per second).
>> 
>> What’s the right/best way to do that from the kernel? In user space I’d just call time(&t) at some convenient points and only print something if “t” has changed. :-)
> 
> Can you use DTrace instead to get the information you need?
> Cheers,
> -Enji