Re: Fwd: Re: Can *you* UFS snapshot a filesystem with 9.0-BETA1?

From: Hans Ottevanger <hans_at_beastielabs.net>
Date: Sun, 21 Aug 2011 12:04:26 +0200
On Sat, Aug 20, 2011 at 09:35:01AM +0100, Hugo Silva wrote:
> 
> 
> Le Thu, 18 Aug 2011 10:22:31 +0100,
> Hugo Silva <hugo_at_barafranca.com> a ?crit :
> 
> Hello,
> 
> > I'm wondering. On a virtual machine (amd64 HVM+PV), it's crashing
> > every time. Not sure if this is SNAFU, as I had never used ufs
> > snapshots on freebsd before.
> > 
> > After running mksnap_ffs, ssh stops working (a telnet session doesn't
> > show the sshd banner). The ssh session where the command was run from
> > stops responding, the webserver dies and xm console'ing from the dom0
> > works, but the VM is unresponsive (ie no login prompt on ENTER).
> > 
> > Anyone else seeing the same?
> 
> I've tried in a FreeBSD guest (9.0-beta1/i386) into VirtualBox and
> I see a LOR (or looks like a LOR), then the system is freezed.
> This is 100% reproductible.
> 
> Unfortunatly, I'm not able to dump a panic or to break into the
> debugger, so a screenshot :
> http://user.lamaiziere.net/patrick/public/lormksnap.png
> 
> You should ask on freebsd-current_at_
> 

Hi,

I can confirm that this happens on "real iron" too.

I use an i386 test installation (P4 2.4 GHz, 2GB RAM, 500GB PATA disk),
running 9.0-BETA1 as distributed (with a kernel effectively being GENERIC
with devices removed that I don't have). When I try to make a snapshot
using

cd /usr; mksnap_ffs /usr/.snap/testsnap

the system is still responsive for a few seconds, with lots of disk
activity, but then it prints the following output on the console (using
firewire and dcons to ease capturing):

lock order reversal:
 1st 0xc5a289e8 ufs (ufs) _at_ /usr/src/sys/ufs/ffs/ffs_snapshot.c:425
 2nd 0xdeb3c078 bufwait (bufwait) _at_ /usr/src/sys/kern/vfs_bio.c:2658
 3rd 0xc5663af8 ufs (ufs) _at_ /usr/src/sys/ufs/ffs/ffs_snapshot.c:546
KDB: stack backtrace:
db_trace_self_wrapper(c09ec6ba,616e735f,6f687370,3a632e74,a363435,...) at db_trace_self_wrapper+0x26
kdb_backtrace(c07099eb,c09efe14,c5035308,c5039408,c4fda440,...) at kdb_backtrace+0x2a
_witness_debugger(c09efe14,c5663af8,c09df984,c5039408,c0a10ba2,...) at _witness_debugger+0x25
witness_checkorder(c5663af8,9,c0a10ba2,222,0,...) at witness_checkorder+0x839
__lockmgr_args(c5663af8,80100,c5663b18,0,0,...) at __lockmgr_args+0x804
ffs_lock(c4fda568,c0bf1250,c59b9c30,80100,c5663aa0,...) at ffs_lock+0x8a
VOP_LOCK1_APV(c0a7fb80,c4fda568,c4fda588,c0a8df20,c5663aa0,...) at VOP_LOCK1_APV+0xb5
_vn_lock(c5663aa0,80100,c0a10ba2,222,c5011e80,...) at _vn_lock+0x5e
ffs_snapshot(c54f9798,c52dda60,c0a13fb0,1a2,0,...) at ffs_snapshot+0x14cb
ffs_mount(c54f9798,c59b0300,ff,394,3,...) at ffs_mount+0x1c13
vfs_donmount(c59b9b80,11100,c50c7c80,c50c7c80,c59ae580,...) at vfs_donmount+0x11e7
nmount(c59b9b80,c4fdacec,c4fdad28,c09ee6dd,0,...) at nmount+0x84
syscallenter(c59b9b80,c4fdace4,c4fdace4,0,c0ab5690,...) at syscallenter+0x263
syscall(c4fdad28) at syscall+0x34
Xint0x80_syscall() at Xint0x80_syscall+0x21
--- syscall (378, FreeBSD ELF32, nmount), eip = 0x280db52b, esp = 0xbfbfe59c, ebp = 0xbfbfed18 ---
lock order reversal:
 1st 0xdeb3c078 bufwait (bufwait) _at_ /usr/src/sys/kern/vfs_bio.c:2658
 2nd 0xc51a72dc snaplk (snaplk) _at_ /usr/src/sys/ufs/ffs/ffs_snapshot.c:818
KDB: stack backtrace:
db_trace_self_wrapper(c09ec6ba,662f7366,735f7366,7370616e,2e746f68,...) at db_trace_self_wrapper+0x26
kdb_backtrace(c07099eb,c09efdfb,c5035308,c5039b58,c4fda440,...) at kdb_backtrace+0x2a
_witness_debugger(c09efdfb,c51a72dc,c0a10c04,c5039b58,c0a10ba2,...) at _witness_debugger+0x25
witness_checkorder(c51a72dc,9,c0a10ba2,332,c5a28a08,...) at witness_checkorder+0x839
__lockmgr_args(c51a72dc,80400,c5a28a08,0,0,...) at __lockmgr_args+0x804
ffs_lock(c4fda568,deb2434c,100000,80400,c5a28990,...) at ffs_lock+0x8a
VOP_LOCK1_APV(c0a7fb80,c4fda568,deb243a8,c0a8df20,c5a28990,...) at VOP_LOCK1_APV+0xb5
_vn_lock(c5a28990,80400,c0a10ba2,332,0,...) at _vn_lock+0x5e
ffs_snapshot(c54f9798,c52dda60,c0a13fb0,1a2,0,...) at ffs_snapshot+0x295e
ffs_mount(c54f9798,c59b0300,ff,394,3,...) at ffs_mount+0x1c13
vfs_donmount(c59b9b80,11100,c50c7c80,c50c7c80,c59ae580,...) at vfs_donmount+0x11e7
nmount(c59b9b80,c4fdacec,c4fdad28,c09ee6dd,0,...) at nmount+0x84
syscallenter(c59b9b80,c4fdace4,c4fdace4,0,c0ab5690,...) at syscallenter+0x263
syscall(c4fdad28) at syscall+0x34
Xint0x80_syscall() at Xint0x80_syscall+0x21
--- syscall (378, FreeBSD ELF32, nmount), eip = 0x280db52b, esp = 0xbfbfe59c, ebp = 0xbfbfed18 ---

After this the system is fully unresponsive and requires a hard reset.

Once rebooted, the snapshot file appears to exist, but is unusable.

When reverting to just softupdates, i.e. disabling journaling on /usr,
everything goes well, except that the same LOR's still do occur, though
the addresses differ.

My amd64 9.0-CURRENT system, just updated to r225055, has the same issue,
but since I do not have WITNESS in the kernel config there, the console
output is missing.

BTW, this issue also makes dump(9) hang the system when the -L option
is used.

Kind regards,

Hans Ottevanger
Received on Sun Aug 21 2011 - 08:36:17 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:16 UTC