Deadlock in UFS/SU+J?

From: Lucius Windschuh <lwindschuh_at_googlemail.com>
Date: Thu, 12 Aug 2010 09:37:04 +0200
Dear list members,
I tried to reproduce another bug on my test machine (i386, CURRENT
r211175), but ran into the following deadlock:

lock order reversal:
 1st 0xc66677c8 ufs (ufs) _at_ /usr/src/sys/kern/vfs_subr.c:2101
 2nd 0xe5ba2460 bufwait (bufwait) _at_ /usr/src/sys/ufs/ffs/ffs_softdep.c:11291
 3rd 0xc8647af8 ufs (ufs) _at_ /usr/src/sys/kern/vfs_subr.c:2101
KDB: stack backtrace:
db_trace_self_wrapper(c08cfabe,f60ee2d4,c06441c5,c06339ab,c08d2b4d,...)
at db_trace_self_wrapper+0x26
kdb_backtrace(c06339ab,c08d2b4d,c5112100,c5115b18,f60ee330,...) at
kdb_backtrace+0x29
_witness_debugger(c08d2b4d,c8647af8,c08c4963,c5115b18,c08d9dc5,...) at
_witness_debugger+0x25
witness_checkorder(c8647af8,9,c08d9dc5,835,0,...) at witness_checkorder+0x839
__lockmgr_args(c8647af8,80100,c8647b18,0,0,...) at __lockmgr_args+0x7f4
ffs_lock(f60ee458,c0643f6b,c08d9267,80100,c8647aa0,...) at ffs_lock+0x8a
VOP_LOCK1_APV(c0945da0,f60ee458,c82ea364,c095e280,c8647aa0,...) at
VOP_LOCK1_APV+0xb5
_vn_lock(c8647aa0,80100,c08d9dc5,835,4,...) at _vn_lock+0x5e
vget(c8647aa0,80100,c82ea2c0,50,0,...) at vget+0xb9
vfs_hash_get(c5ea2a20,5c000e,80000,c82ea2c0,f60ee5a8,...) at vfs_hash_get+0xe6
ffs_vgetf(c5ea2a20,5c000e,80000,f60ee5a8,1,...) at ffs_vgetf+0x49
softdep_sync_metadata(c6667770,0,c08edf39,144,0,...) at
softdep_sync_metadata+0xc92
ffs_syncvnode(c6667770,1,f60ee664,246,c666781c,...) at ffs_syncvnode+0x3e2
ffs_truncate(c6667770,200,0,880,c5e4f780,...) at ffs_truncate+0x87e
ufs_direnter(c6667770,c8647aa0,f60ee91c,f60eeba4,0,...) at ufs_direnter+0x8d4
ufs_makeinode(f60eeba4,0,f60eeb00,f60eea64,c0879745,...) at ufs_makeinode+0x517
ufs_create(f60eeb00,f60eeb18,0,0,f60eeb78,...) at ufs_create+0x30
VOP_CREATE_APV(c0945da0,f60eeb00,2,c0643f6b,0,...) at VOP_CREATE_APV+0xa5
vn_open_cred(f60eeb78,f60eec2c,180,0,c5e4f780,...) at vn_open_cred+0x1d3
vn_open(f60eeb78,f60eec2c,180,c5f077e0,0,...) at vn_open+0x3b
kern_openat(c82ea2c0,ffffff9c,2848cc80,0,a03,...) at kern_openat+0xf5
kern_open(c82ea2c0,2848cc80,0,a02,180,...) at kern_open+0x35
open(c82ea2c0,f60eecec,f60eed28,c08d1426,0,...) at open+0x30
syscallenter(c82ea2c0,f60eece4,f60eece4,0,c097b8f0,...) at syscallenter+0x23f
syscall(f60eed28) at syscall+0x34
Xint0x80_syscall() at Xint0x80_syscall+0x21
--- syscall (5, FreeBSD ELF32, open), eip = 0x282a2623, esp =
0xbfbfb08c, ebp = 0xbfbfb538 ---

The test, upgrading lang/gcc44 in an endless loop, gut stuck sometime
in the night when /etc/periodic/daily/450.status-security was running
in parallel:

$ ps axl | grep ufs
    0 44219 44215   0  76  0  13624   7200 ufs    D     ??    0:03.19
find -sx / /usr /zfs/test /dev/null -type f ( -perm -u+x -or -perm
-g+x -or -perm -o+x ) ( -perm -u+s -or -perm -g+s ) -exec ls -liTd {}
+
    0 44713  2638   0  44  0   9752   1304 piperd S+     3    0:00.01 grep ufs
    0 43824 43823   0  76  0  10532   1368 ufs    D+     2    0:00.02
/usr/ports/lang/gcc44/work/build/./prev-gcc/collect2 -dynamic-linker
/libexec/ld-elf.so.1 -o cc1 /usr/lib/crt1.o /usr/lib/crti.o
/usr/ports/lang/gcc44/work/build/./prev-gcc/crtbegin.o
-L/usr/local/lib -L/usr/ports/lang/gcc44/work/build/./prev-gcc
-L/usr/local/i386-portbld-freebsd9.0/bin c-lang.o stub-objc.o
attribs.o c-errors.o c-lex.o c-pragma.o c-decl.o c-typeck.o
c-convert.o c-aux-info.o c-common.o c-opts.o c-format.o c-semantics.o
c-ppoutput.o c-cppbuiltin.o c-objc-common.o c-dump.o c-pch.o
c-parser.o i386-c.o c-gimplify.o tree-mudflap.o c-pretty-print.o
c-omp.o cc1-checksum.o main.o libbackend.a ../libcpp/libcpp.a
../libdecnumber/libdecnumber.a ../libcpp/libcpp.a
/usr/local/lib/libiconv.so -rpath /usr/local/lib
../libiberty/libiberty.a ../libdecnumber/libdecnumber.a -lmpfr -lgmp
-lgcc -lgcc_eh -lc -lgcc -lgcc_eh
/usr/ports/lang/gcc44/work/build/./prev-gcc/crtend.o /usr/lib/crtn.o
    0 43828 43780   0  76  0   9608   3580 ufs    D+     2    0:00.02
/usr/local/bin/as -I. -I. -I.././../gcc-4.4-20100803/gcc
-I.././../gcc-4.4-20100803/gcc/.
-I.././../gcc-4.4-20100803/gcc/../include
-I.././../gcc-4.4-20100803/gcc/../libcpp/include -I/usr/local/include
-I.././../gcc-4.4-20100803/gcc/../libdecnumber
-I.././../gcc-4.4-20100803/gcc/../libdecnumber/dpd -I../libdecnumber
-I/usr/local/include -o gcc.o /var/tmp//ccxkWkSs.s

Short backtraces of the affected processes:
# procstat -kk 44219
  PID    TID COMM             TDNAME           KSTACK
44219 100252 find             -                mi_switch+0x200
sleepq_switch+0x15f sleepq_wait+0x63 __lockmgr_args+0x4bf
ffs_lock+0x8a VOP_LOCK1_APV+0xb5 _vn_lock+0x5e vget+0xb9
cache_lookup+0x679 vfs_cache_lookup+0xad VOP_LOOKUP_APV+0xa5
lookup+0x5b3 namei+0x4af kern_statat_vnhook+0x72 kern_statat+0x3c
kern_lstat+0x36 lstat+0x2f syscallenter+0x23f
# procstat -kk 43824
  PID    TID COMM             TDNAME           KSTACK
43824 100082 collect2         -                mi_switch+0x200
sleepq_switch+0x15f sleepq_wait+0x63 __lockmgr_args+0x4bf
ffs_lock+0x8a VOP_LOCK1_APV+0xb5 _vn_lock+0x5e vget+0xb9
cache_lookup+0x679 vfs_cache_lookup+0xad VOP_LOOKUP_APV+0xa5
lookup+0x5b3 namei+0x4af vn_open_cred+0x2ed vn_open+0x3b
kern_openat+0xf5 kern_open+0x35 open+0x30
# procstat -kk 43828
  PID    TID COMM             TDNAME           KSTACK
43828 100276 as               -                mi_switch+0x200
sleepq_switch+0x15f sleepq_wait+0x63 __lockmgr_args+0x4bf
ffs_lock+0x8a VOP_LOCK1_APV+0xb5 _vn_lock+0x5e vget+0xb9
cache_lookup+0x679 vfs_cache_lookup+0xad VOP_LOOKUP_APV+0xa5
lookup+0x5b3 namei+0x4af vn_open_cred+0x2ed vn_open+0x3b
kern_openat+0xf5 kern_open+0x35 open+0x30

As you see, both gcc-related process hang in open(), the find process
hangs in lstat(). I don't know which one got stuck first, but I am
confident that I ran into the deadlock described by WITNESS above.

Is there anything I can help with? The machine is still hanging in
this state, entering the debugger is possible, generating a coredump
should work (did hang the last time, but since than, I updated the
kernel).

Kernel config: https://sites.google.com/site/lwfreebsd/Home/files/kernelconfig-CURRENT.txt.gz?attredirects=0&d=1

DDB output from a textdump simulation:
https://sites.google.com/site/lwfreebsd/Home/files/current-r211175-UFS-deadlock.txt.gz?attredirects=0&d=1

Regards

Lucius
Received on Thu Aug 12 2010 - 05:37:05 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:06 UTC