Hello, I've done some tests to verify that the problem only occures when SU+J is used, but not SU without J. In fact, I did run the following two loops on different TTYs in parallel: while 1 cp -r /usr/src /root rm -Rf /root/src end while 1 mksnap_ffs / /.snap/snap rm -f /.snap/snap end With SU without J the system survives this for at least 1 hour. But as soon as SU+J is used it most likely deadlocks or even panics in the first 1 or 2 minutes. What extactly happens seems to vary... In most cases the system just deadlocks, sometimes like alain_at_bsdgate.org descripes and sometimes it's completely unresponsive to any input. I've seen kernel messages like "fsync: giving up on dirty". Several times the system paniced. In most cases printing the generic "panic: page fault while in kernel mode" and one time printing "panic: snapacct_ufs2: bad block". I've never seen the same backtrace twice. One time the system suddenly rebooted, like a tripple fault or something like that happend. Since it's much more likely that the problems described above arrise when the the filesystem is loaded (for example by the first loop) while taking the snapshot this looks like some kind of race condition or something like that. Some more information from an older debug session can be found at: http://deponie.yamagi.org/freebsd/debug/snapshots_panic/ On Tue, 10 Jan 2012 10:30:13 -0800 Kirk McKusick <mckusick_at_mckusick.com> wrote: > > Date: Mon, 9 Jan 2012 18:30:51 +0100 > > From: Yamagi Burmeister <lists_at_yamagi.org> > > To: jeff_at_freebsd.org, mckusick_at_freebsd.org > > Cc: freebsd-current_at_freebsd.org, bryce_at_bryce.net > > Subject: Re: FS hang when creating snapshots on a UFS SU+J setup > > > > Hello, > > > > I'm sorry to bother you, but you may not be aware of this thread and > > this problem. We are several people experiencing deadlocks, kernel > > panics and other problems when creating sanpshots on file systems > > with SU+J. It would be nice to get some feedback, e.g. how can we > > help debugging and / or fixing this problem. > > > > Thank you, > > Yamagi > > First step in debugging is to find out if the problem is SU+J > specific. To find out, turn off SU+J but leave SU. This change > is done by running: > > umount <filesystem> > tunefs -j disable <filesystem> > mount <filesystem> > cd <filesystem> > rm .sujournal > > You may want to run `fsck -f' on the filesystem while you have > it unmounted just to be sure that it is clean. Then run your > snapshot request to see if it still fails. If it works, then > we have narrowed the problem down to something related to SU+J. > If it fails then we have a broader issue to deal with. > > If you wish to go back to using SU+J after the test, you can > reenable SU+J by running: > > umount <filesystem> > tunefs -j enable <filesystem> > mount <filesystem> > > When responding to me, it is best to use my <mckusick_at_mckusick.com> > email as I tend to read it more regularly. > > Kirk McKusick > -- Homepage: www.yamagi.org XMPP: yamagi_at_yamagi.org GnuPG/GPG: 0xEFBCCBCB
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:23 UTC