Re: FS hang when creating snapshots on a UFS SU+J setup

From: Yamagi Burmeister <lists_at_yamagi.org>
Date: Wed, 11 Jan 2012 10:30:39 +0100
Hello,
I've done some tests to verify that the problem only occures when SU+J
is used, but not SU without J. In fact, I did run the following two
loops on different TTYs in parallel:

while 1
 cp -r /usr/src /root
 rm -Rf /root/src
end

while 1
 mksnap_ffs / /.snap/snap
 rm -f /.snap/snap
end

With SU without J the system survives this for at least 1 hour. But as
soon as SU+J is used it most likely deadlocks or even panics in the
first 1 or 2 minutes. What extactly happens seems to vary... In most
cases the system just deadlocks, sometimes like alain_at_bsdgate.org
descripes and sometimes it's completely unresponsive to any input. 
I've seen kernel messages like "fsync: giving up on dirty".

Several times the system paniced. In most cases printing the generic
"panic: page fault while in kernel mode" and one time printing 
"panic: snapacct_ufs2: bad block". I've never seen the same
backtrace twice. One time the system suddenly rebooted, like a tripple
fault or something like that happend.

Since it's much more likely that the problems described above arrise
when the the filesystem is loaded (for example by the first loop) while
taking the snapshot this looks like some kind of race condition or
something like that. 

Some more information from an older debug session can be found at:
http://deponie.yamagi.org/freebsd/debug/snapshots_panic/

On Tue, 10 Jan 2012 10:30:13 -0800
Kirk McKusick <mckusick_at_mckusick.com> wrote:

> > Date: Mon, 9 Jan 2012 18:30:51 +0100
> > From: Yamagi Burmeister <lists_at_yamagi.org>
> > To: jeff_at_freebsd.org, mckusick_at_freebsd.org
> > Cc: freebsd-current_at_freebsd.org, bryce_at_bryce.net
> > Subject: Re: FS hang when creating snapshots on a UFS SU+J setup
> > 
> > Hello,
> > 
> > I'm sorry to bother you, but you may not be aware of this thread and
> > this problem. We are several people experiencing deadlocks, kernel
> > panics and other problems when creating sanpshots on file systems
> > with SU+J. It would be nice to get some feedback, e.g. how can we
> > help debugging and / or fixing this problem.
> > 
> > Thank you,
> > Yamagi
> 
> First step in debugging is to find out if the problem is SU+J
> specific. To find out, turn off SU+J but leave SU. This change
> is done by running:
> 
> 	umount <filesystem>
> 	tunefs -j disable <filesystem>
> 	mount <filesystem>
> 	cd <filesystem>
> 	rm .sujournal
> 
> You may want to run `fsck -f' on the filesystem while you have
> it unmounted just to be sure that it is clean. Then run your
> snapshot request to see if it still fails. If it works, then
> we have narrowed the problem down to something related to SU+J.
> If it fails then we have a broader issue to deal with.
> 
> If you wish to go back to using SU+J after the test, you can
> reenable SU+J by running:
> 
> 	umount <filesystem>
> 	tunefs -j enable <filesystem>
> 	mount <filesystem>
> 
> When responding to me, it is best to use my <mckusick_at_mckusick.com>
> email as I tend to read it more regularly.
> 
> 	Kirk McKusick
> 


-- 
Homepage:  www.yamagi.org
XMPP:      yamagi_at_yamagi.org
GnuPG/GPG: 0xEFBCCBCB

Received on Wed Jan 11 2012 - 08:30:55 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:23 UTC