Re: How a full fsck screwed up my SU+J filesystem

From: Kirk McKusick <mckusick_at_mckusick.com>
Date: Wed, 01 Dec 2010 16:30:37 -0800
> Date: Wed, 1 Dec 2010 16:27:48 +0200
> From: Kostik Belousov <kostikbel_at_gmail.com>
> To: Peter Holm <pho_at_freebsd.org>
> Cc: Garrett Cooper <yanegomi_at_gmail.com>,
>         Marshall Kirk McKusick <mckusick_at_mckusick.com>, current_at_freebsd.org
> Subject: Re: How a full fsck screwed up my SU+J filesystem
> 
> On Wed, Dec 01, 2010 at 12:00:08PM +0100, Peter Holm wrote:
> > On Wed, Dec 01, 2010 at 01:28:06AM -0800, Garrett Cooper wrote:
> > >
> > > So... I was doing a portmaster -af today because vlc stopped playing
> > > audio (for some reason ... I kind of went on a pkg_cutleaves rampage
> > > and probably deinstalled too much stuff), and the machine hardlocked
> > > during an upgrade. I did a soft reboot and saw messages along the
> > > lines of "your journal and filesystem mount time mismatched; running
> > > a full fsck". I figured "ok, sure..." and let it do it's thing.
> > > Problem was that it pruned a lot of stuff from my /usr partition --
> > > including the .sujournal !!! So now it's stuck at Mounting local
> > > file systems: stating:
> > > 
> > > Failed to find journal.   Use tunefs to create one
> > > Failed to start journal: 2
> > > 
> > > (I assume the 2 means ENOENT). All of the above were printf(9)'s
> > > from the kernel.
> > > 
> > > Now the machine won't continue in multiuser mode (doesn't respond
> > > to interrupts, no panic, etc). Going into ddb, I don't see anything
> > > in info_threads (just a bunch of references to sched_switch, a few
> > > to fork_trampoline, cpustop_handler, and kdb_enter). I'm going to
> > > try and massage the machine back to life from single user mode, but
> > > the fact that this died in this way (i.e. .sujournal getting nuked
> > > by a full fsck) is a bit disheartening for SU+J :(... It would be
> > > nice if at least the fsck aborted before going and nuking the
> > > journal :/... (or at the very least if the file wasn't removable --
> > > i.e. SF_NOUNLINK).
> > > 
> > > Here's to hoping I can resuscitate the filesystem...
> > > 
> > > Thanks,
> > > -Garrett
> >
> > Thank you for reporting this.
> >
> > I was able to reproduce the problem by:
> >
> > tunefs -j enable /dev/md5a
> > mount /dev/md5a /mnt
> > chflags 0 /mnt/.sujournal
> > rm -f /mnt/.sujournal
> > umount /mnt
> > mount /dev/md5a /mnt
> >
> > The mount(1) is now stuck in mntref.
> >
> > http://people.freebsd.org/~pho/stress/log/kostik404.txt
> >
> > A sequence of "tunefs -j disable" + "tunefs -j enable" should get
> > you going.
> 
> The action is of the category "do not do it then" for sure.
> 
> The problem in kostik404 is due to ffs_mount() did not cleaned up
> the vnodes instantiated during the mount. Activating softdep journal
> instantiates at least root vnode, and a journal vnode, if found. The
> following patch fixed it for me.
> 
> diff --git a/sys/ufs/ffs/ffs_vfsops.c b/sys/ufs/ffs/ffs_vfsops.c
> index 94951e4..72f40da 100644
> --- a/sys/ufs/ffs/ffs_vfsops.c
> +++ b/sys/ufs/ffs/ffs_vfsops.c
> _at__at_ -928,6 +928,7 _at__at_ ffs_mountfs(devvp, mp, td)
>  		if ((fs->fs_flags & FS_DOSOFTDEP) &&
>  		    (error =3D softdep_mount(devvp, mp, fs, cred)) !=3D 0) {
>  			free(fs->fs_csp, M_UFSMNT);
> +			ffs_flushfiles(mp, FORCECLOSE, td);
>  			goto out;
>  		}
>  		if (fs->fs_snapinum[0] !=3D 0)
> 

Thanks all: Garrett for the report, Peter for the way to reproduce
the problem, and Kostik for a fix. I have copied Jeff so that he can
confirm that Kostik's fix is the appropriate thing to do. And I will
take a look at fsck to see if I can make it a bit more paranoid about
removing .sujournal.

	Kirk McKusick
Received on Thu Dec 02 2010 - 00:08:54 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:09 UTC