Snapshots fail with UFS+J (was: Re: Fwd: Re: Can *you* UFS snapshot a filesystem with 9.0-BETA1?)

From: Hans Ottevanger <hans_at_beastielabs.net>
Date: Mon, 29 Aug 2011 20:30:58 +0200
On Sun, Aug 21, 2011 at 12:04:26PM +0200, Hans Ottevanger wrote:
> On Sat, Aug 20, 2011 at 09:35:01AM +0100, Hugo Silva wrote:
> > 
> > 
> > Le Thu, 18 Aug 2011 10:22:31 +0100,
> > Hugo Silva <hugo_at_barafranca.com> a ?crit :
> > 
> > Hello,
> > 
> > > I'm wondering. On a virtual machine (amd64 HVM+PV), it's crashing
> > > every time. Not sure if this is SNAFU, as I had never used ufs
> > > snapshots on freebsd before.
> > > 
> > > After running mksnap_ffs, ssh stops working (a telnet session doesn't
> > > show the sshd banner). The ssh session where the command was run from
> > > stops responding, the webserver dies and xm console'ing from the dom0
> > > works, but the VM is unresponsive (ie no login prompt on ENTER).
> > > 
> > > Anyone else seeing the same?
> > 
> > I've tried in a FreeBSD guest (9.0-beta1/i386) into VirtualBox and
> > I see a LOR (or looks like a LOR), then the system is freezed.
> > This is 100% reproductible.
> > 
> > Unfortunatly, I'm not able to dump a panic or to break into the
> > debugger, so a screenshot :
> > http://user.lamaiziere.net/patrick/public/lormksnap.png
> > 
> > You should ask on freebsd-current_at_
> > 
> 
> Hi,
> 
> I can confirm that this happens on "real iron" too.
> 
> I use an i386 test installation (P4 2.4 GHz, 2GB RAM, 500GB PATA disk),
> running 9.0-BETA1 as distributed (with a kernel effectively being GENERIC
> with devices removed that I don't have). When I try to make a snapshot
> using
> 
> cd /usr; mksnap_ffs /usr/.snap/testsnap
> 
> the system is still responsive for a few seconds, with lots of disk
> activity, but then it prints the following output on the console (using
> firewire and dcons to ease capturing):
> 
> lock order reversal:
>  1st 0xc5a289e8 ufs (ufs) _at_ /usr/src/sys/ufs/ffs/ffs_snapshot.c:425
>  2nd 0xdeb3c078 bufwait (bufwait) _at_ /usr/src/sys/kern/vfs_bio.c:2658
>  3rd 0xc5663af8 ufs (ufs) _at_ /usr/src/sys/ufs/ffs/ffs_snapshot.c:546
> KDB: stack backtrace:
> db_trace_self_wrapper(c09ec6ba,616e735f,6f687370,3a632e74,a363435,...) at db_trace_self_wrapper+0x26
> kdb_backtrace(c07099eb,c09efe14,c5035308,c5039408,c4fda440,...) at kdb_backtrace+0x2a
> _witness_debugger(c09efe14,c5663af8,c09df984,c5039408,c0a10ba2,...) at _witness_debugger+0x25
> witness_checkorder(c5663af8,9,c0a10ba2,222,0,...) at witness_checkorder+0x839
> __lockmgr_args(c5663af8,80100,c5663b18,0,0,...) at __lockmgr_args+0x804
> ffs_lock(c4fda568,c0bf1250,c59b9c30,80100,c5663aa0,...) at ffs_lock+0x8a
> VOP_LOCK1_APV(c0a7fb80,c4fda568,c4fda588,c0a8df20,c5663aa0,...) at VOP_LOCK1_APV+0xb5
> _vn_lock(c5663aa0,80100,c0a10ba2,222,c5011e80,...) at _vn_lock+0x5e
> ffs_snapshot(c54f9798,c52dda60,c0a13fb0,1a2,0,...) at ffs_snapshot+0x14cb
> ffs_mount(c54f9798,c59b0300,ff,394,3,...) at ffs_mount+0x1c13
> vfs_donmount(c59b9b80,11100,c50c7c80,c50c7c80,c59ae580,...) at vfs_donmount+0x11e7
> nmount(c59b9b80,c4fdacec,c4fdad28,c09ee6dd,0,...) at nmount+0x84
> syscallenter(c59b9b80,c4fdace4,c4fdace4,0,c0ab5690,...) at syscallenter+0x263
> syscall(c4fdad28) at syscall+0x34
> Xint0x80_syscall() at Xint0x80_syscall+0x21
> --- syscall (378, FreeBSD ELF32, nmount), eip = 0x280db52b, esp = 0xbfbfe59c, ebp = 0xbfbfed18 ---
> lock order reversal:
>  1st 0xdeb3c078 bufwait (bufwait) _at_ /usr/src/sys/kern/vfs_bio.c:2658
>  2nd 0xc51a72dc snaplk (snaplk) _at_ /usr/src/sys/ufs/ffs/ffs_snapshot.c:818
> KDB: stack backtrace:
> db_trace_self_wrapper(c09ec6ba,662f7366,735f7366,7370616e,2e746f68,...) at db_trace_self_wrapper+0x26
> kdb_backtrace(c07099eb,c09efdfb,c5035308,c5039b58,c4fda440,...) at kdb_backtrace+0x2a
> _witness_debugger(c09efdfb,c51a72dc,c0a10c04,c5039b58,c0a10ba2,...) at _witness_debugger+0x25
> witness_checkorder(c51a72dc,9,c0a10ba2,332,c5a28a08,...) at witness_checkorder+0x839
> __lockmgr_args(c51a72dc,80400,c5a28a08,0,0,...) at __lockmgr_args+0x804
> ffs_lock(c4fda568,deb2434c,100000,80400,c5a28990,...) at ffs_lock+0x8a
> VOP_LOCK1_APV(c0a7fb80,c4fda568,deb243a8,c0a8df20,c5a28990,...) at VOP_LOCK1_APV+0xb5
> _vn_lock(c5a28990,80400,c0a10ba2,332,0,...) at _vn_lock+0x5e
> ffs_snapshot(c54f9798,c52dda60,c0a13fb0,1a2,0,...) at ffs_snapshot+0x295e
> ffs_mount(c54f9798,c59b0300,ff,394,3,...) at ffs_mount+0x1c13
> vfs_donmount(c59b9b80,11100,c50c7c80,c50c7c80,c59ae580,...) at vfs_donmount+0x11e7
> nmount(c59b9b80,c4fdacec,c4fdad28,c09ee6dd,0,...) at nmount+0x84
> syscallenter(c59b9b80,c4fdace4,c4fdace4,0,c0ab5690,...) at syscallenter+0x263
> syscall(c4fdad28) at syscall+0x34
> Xint0x80_syscall() at Xint0x80_syscall+0x21
> --- syscall (378, FreeBSD ELF32, nmount), eip = 0x280db52b, esp = 0xbfbfe59c, ebp = 0xbfbfed18 ---
> 
> After this the system is fully unresponsive and requires a hard reset.
> 
> Once rebooted, the snapshot file appears to exist, but is unusable.
> 
> When reverting to just softupdates, i.e. disabling journaling on /usr,
> everything goes well, except that the same LOR's still do occur, though
> the addresses differ.
> 
> My amd64 9.0-CURRENT system, just updated to r225055, has the same issue,
> but since I do not have WITNESS in the kernel config there, the console
> output is missing.
> 
> BTW, this issue also makes dump(9) hang the system when the -L option
> is used.
> 
> Kind regards,
> 
> Hans Ottevanger
> _______________________________________________
> freebsd-current_at_freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"

Since I did not see any response to these messages and I cannot imagine that
Hugo and I are the only ones with this issue, I will follow up to my own post.

I have tried just yesterday to make a snapshot of the /usr filesystem (about
16 GB) of my amd64 test system (Q6600, 8GB RAM, 500GB SATA disk) running
9.0-BETA1 (r225228) and the problem still occurs. After these LOR's:

lock order reversal:
 1st 0xfffffe00073ab278 ufs (ufs) _at_ /usr/src/sys/ufs/ffs/ffs_snapshot.c:425
 2nd 0xffffff81eb243498 bufwait (bufwait) _at_ /usr/src/sys/kern/vfs_bio.c:2658
 3rd 0xfffffe00073629f8 ufs (ufs) _at_ /usr/src/sys/ufs/ffs/ffs_snapshot.c:546
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
kdb_backtrace() at kdb_backtrace+0x37
_witness_debugger() at _witness_debugger+0x2e
witness_checkorder() at witness_checkorder+0x807
__lockmgr_args() at __lockmgr_args+0xdc6
ffs_lock() at ffs_lock+0x8c
VOP_LOCK1_APV() at VOP_LOCK1_APV+0x9b
_vn_lock() at _vn_lock+0x47
ffs_snapshot() at ffs_snapshot+0x1c27
ffs_mount() at ffs_mount+0xa23
vfs_donmount() at vfs_donmount+0xddc
nmount() at nmount+0x63
syscallenter() at syscallenter+0x1aa
syscall() at syscall+0x4c
Xfast_syscall() at Xfast_syscall+0xdd
--- syscall (378, FreeBSD ELF64, nmount), rip = 0x8008a118c, rsp = 0x7fffffffd428, rbp = 0x7fffffffde4b ---
lock order reversal:
 1st 0xffffff81eb243498 bufwait (bufwait) _at_ /usr/src/sys/kern/vfs_bio.c:2658
 2nd 0xfffffe0007404a30 snaplk (snaplk) _at_ /usr/src/sys/ufs/ffs/ffs_snapshot.c:818
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
kdb_backtrace() at kdb_backtrace+0x37
_witness_debugger() at _witness_debugger+0x2e
witness_checkorder() at witness_checkorder+0x807
__lockmgr_args() at __lockmgr_args+0xdc6
ffs_lock() at ffs_lock+0x8c
VOP_LOCK1_APV() at VOP_LOCK1_APV+0x9b
_vn_lock() at _vn_lock+0x47
ffs_snapshot() at ffs_snapshot+0x1b02
ffs_mount() at ffs_mount+0xa23
vfs_donmount() at vfs_donmount+0xddc
nmount() at nmount+0x63
syscallenter() at syscallenter+0x1aa
syscall() at syscall+0x4c
Xfast_syscall() at Xfast_syscall+0xdd
--- syscall (378, FreeBSD ELF64, nmount), rip = 0x8008a118c, rsp = 0x7fffffffd428, rbp = 0x7fffffffde4b ---
  
the system is completely unresponsive after a few seconds and can only
be revived by pushing the reset button.

When making a snapshot of a larger filesystem it takes a bit longer, but
the system will finally lock up.

Mark that this is not the usual extreme slowdown due to the snapshot
taking all the disk bandwidth: the system locks up tightly and does not
recover.

Is anybody else seeing this? Is it a known problem?

How to proceed?

Copied to freebsd-fs_at_ to elicit more response.

Kind regards,

Hans Ottevanger
Received on Mon Aug 29 2011 - 16:31:04 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:17 UTC