Re: Shutdown errors and timeout

From: Mateusz Piotrowski <0mp_at_FreeBSD.org> Date: Wed, 18 Nov 2020 09:47:32 +0100 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:25 UTC

On 11/16/20 7:16 PM, Johan Hendriks wrote:
>
> On 14/11/2020 13:03, Mateusz Piotrowski wrote:
>> On 11/14/20 1:19 AM, Tomoaki AOKI wrote:
>>> On Fri, 13 Nov 2020 20:04:59 +0900 (JST)
>>> Yasuhiro KIMURA <yasu_at_utahime.org> wrote:
>>>
>>>> From: Johan Hendriks <joh.hendriks_at_gmail.com>
>>>>
>>>>> Hello all, i have two FreeBSD 13 machines, one is a bare metal and one
>>>>> is virtualbox machine which i both update about once a week.
>>>>>
>>>>> The vritual machine seems to fail stopping something and gives a
>>>>> timeout after 90 sec.
>>>>>
>>>>> The console ends with
>>>>>
>>>>> Writing entropy file: .
>>>>> Writing early boot entropy file: .
>>>>>
>>>>> 90 second watchdog timeout expired. Shutdown terminated.
>>>>> Fri Nov13 11:20:40 CEST 2020
>>>>> Nov 13 11:20:40 test-head init[1]: /etc/rc.shutdown terminated
>>>>> abnormally, going to single user mode
>>>>> ...
>>>>>
>>>>> On the bare metal machine i see the following.
>>>>> Writing entropy file: .
>>>>> Writing early boot entropy file: .
>>>>> cannot unmount '/var/run': umount failed
>>>>> cannot unmount '/var/log': umount failed
>>>>> cannot unmount '/var': umount failed
>>>>> cannot unmount '/usr/home': umount failed
>>>>> cannot unmount '/usr': umount failed
>>>>> cannot unmount '/': umount failed
>>>>>
>>>> (snip)
>>>>> The pools have not been upgraded after the latest openzfs import,
>>>>> maybe that is related?
>>>>>
>>>>> FreeBSD test-freebsd-head 13.0-CURRENT FreeBSD 13.0-CURRENT #2
>>>>> r367585:
>>>>>
>>>>> First thing i noticed is about a week ago.
>>>> I'm facing same problem with 13.0-CURRENT amd64 r367487 and
>>>> virtualbox. In my case I use autofs to mount remote file system of
>>>> 12.2-RELEASE amd64 server with NFSv4. When there is still filesystem
>>>> mounted by autofs, then watchdog timeout happens while shutdown. The
>>>> watchdog timeout can be worked around by executing `automount -fu`
>>>> before shutting down. But 'cannot unmount ...' error messages are
>>>> still displayed.
>>>>
>>>> I added 'rc_debug="YES"' to /etc/rc.conf and checked which rc script
>>>> causes this message. Then it is displayed when following `zfs_stop`
>>>> function of /etc/rc.d/zfs is executed.
>>>>
>>>> ----------------------------------------------------------------------
>>>> zfs_stop_main()
>>>> {
>>>>     zfs unshare -a
>>>>     zfs unmount -a
>>>> }
>>>> ----------------------------------------------------------------------
>>>>
>>>> At this point syslog process still running and it opens some files
>>>> under /var/log. So it make sence that `zfs unmount -a` results in the
>>>> message.
>>>>
>>>> Probably order of executing each rc script in shutdown time should be
>>>> changed so `/etc/rc.d/zfs faststop` is executed after all processes
>>>> other than `init' are exited.
>>> This happens on stable/12, too.
>>> As a workaround, reverting r367291 on head (r367546 on stable/12)
>>> would stop the issue until this is really fixed.
>>>
>>> If you have shared dataset or jail(s) mounting dataset, the workaround
>>> would be discouraged. Read commit message for detail.
>>
>> I've committed r367291 and r367546.
>>
>> I am not sure if I can think of a proper fix for the described issues, so I guess the best idea 
>> would be to revert those changes for now until we figure out how to do it properly.
>
>
> I can tell that reverting the mentioned commit i do not have the symptoms when i reboot my servers.
> Thank you all for your time, and no sorry needed ;-)

I'll revert the commit then. I'm just waiting for an approval from a src committer.

Best,

Mateusz