On Tue, Jul 22, 2008 at 06:54:04PM +0200, Attilio Rao wrote: > 2008/7/22, Kostik Belousov <kostikbel_at_gmail.com>: > > On Mon, Jul 21, 2008 at 05:03:14PM -0400, Andrew Gallatin wrote: > > > I can panic today's -current reliably (or hang it with > > > WITNESS/INVARIENTS disabled). When it crashes, I see > > > the appended panic messages. > > > > > > It seems to be 100% reproducible on my box (AMD64 x2, > > > 512MB ram, UFS2). If anybody savvy in this area would > > > like to reproduce it, I've left the program at ~gallatin/ahunt.c > > > on freefall. Compile it, and run it as: > > > ./a.out -mmbfileinit -madvise=/var/tmp/zot -random -size=95536 > > > -touch=4096 -rewrite=2 > > > > > > > > > Cheers, > > > > > > Drew > > > > > > PS: Here is a serial console log from the panic: > > > > ... > > > > > > > login: shared lock of (lockmgr) ufs _at_ kern/vfs_subr.c:2044 > > > while exclusively locked from kern/vfs_vnops.c:593 > > > panic: share->excl > > > cpuid = 1 > > > KDB: enter: panic > > > [thread pid 1702 tid 100149 ] > > > Stopped at kdb_enter+0x3d: movq $0,0x639958(%rip) > > > db> tr > > > Tracing pid 1702 tid 100149 td 0xffffff000d08f000 > > > kdb_enter() at kdb_enter+0x3d > > > panic() at panic+0x176 > > > witness_checkorder() at witness_checkorder+0x137 > > > __lockmgr_args() at __lockmgr_args+0xc74 > > > ffs_lock() at ffs_lock+0x8c > > > VOP_LOCK1_APV() at VOP_LOCK1_APV+0x9b > > > _vn_lock() at _vn_lock+0x47 > > > vget() at vget+0x7b > > > vnode_pager_lock() at vnode_pager_lock+0x146 > > > vm_fault() at vm_fault+0x1e2 > > > trap_pfault() at trap_pfault+0x128 > > > trap() at trap+0x395 > > > calltrap() at calltrap+0x8 > > > --- trap 0xc, rip = 0xffffffff8079f2bd, rsp = 0xfffffffe58c2f7b0, rbp = > > > 0xfffffffe58c2f830 --- > > > copyin() at copyin+0x3d > > > ffs_write() at ffs_write+0x2f8 > > > VOP_WRITE_APV() at VOP_WRITE_APV+0x10b > > > vn_write() at vn_write+0x23f > > > dofilewrite() at dofilewrite+0x85 > > > --More-- > > > > > > kern_writev() at kern_writev+0x60 > > > write() at write+0x54 > > > syscall() at syscall+0x1dd > > > Xfast_syscall() at Xfast_syscall+0xab > > > --- syscall (4, FreeBSD ELF64, write), rip = 0x8007296ec, rsp = > > > 0x7fffffffe158, rbp = 0x7fffffffe210 --- > > > db> show locks > > > exclusive sleep mutex vnode interlock r = 0 (0xffffff000d0dc0c0) locked > > > _at_ vm/vnode_pager.c:1199 > > > exclusive sx user map r = 0 (0xffffff000d054360) locked _at_ vm/vm_map.c:3115 > > > exclusive lockmgr bufwait r = 0 (0xfffffffe5047f278) locked _at_ > > > kern/vfs_bio.c:1783 > > > exclusive lockmgr ufs r = 0 (0xffffff000d0dc098) locked _at_ > > > kern/vfs_vnops.c:593 > > > db> > > > > > > Essentially, you tried to do the write of the part of the region mmaped > > from the file, to the file. The VOP_WRITE() is called with exclusively > > locked vnode, while fault handler tried to lock the vnode in shared mode > > to page in. > > > > The following change fixed it for me. > > Attilio, would it make sense to consider LK_CANRECURSE | LK_SHARED as > > a request for the exlusive lock when the current thread already hold the > > exclusive lock instead ? I think this would be a proper solution. > > I don't like this kind of magics and ecoding in lockmgr. > I think that the better thing to do here is to recurse the exclusive > lock as you pass to vget(). It could be argued that lockmgr is a black magic in whole. On the other hand, I had to use VOP_ISLOCKED() and manually construct lock request while all needed information is at hands inside the lockmgr. Moreover, I believe that doing implicit shared->exclusive request upgrade in this situation (excl locked by curthread, LK_CANRECURSE present) is right. > > Also note that without WITNESS the code will return EDEADLK in this > case while traditionally what would have happened is that the lockmgr > would have to be downgraded silently, but as you can expect this is a > very dangerous practice. Fully agree.
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:33 UTC