2009/8/27 Andrew Brampton <brampton+freebsd_at_gmail.com>: > 2009/8/26 Attilio Rao <attilio_at_freebsd.org>: >> 2009/8/27 Andrew Brampton <brampton+freebsd_at_gmail.com>: >>> Hi, >>> The following sequence of commands fails on line 4 with an assertion >>> that the lock is not currently held: >>> >>> 1: rw_wlock(&rw); >>> 2: if ( rw_try_wlock(&rw) ) >>> 3: rw_wunlock(&rw); >>> 4: rw_wunlock(&rw); >>> >>> This is because after line 3 is executed the rw lock is no longer >>> held. I tracked this bug down to _rw_try_wlock which correctly >>> increments rw_recurse, but does not set the RW_LOCK_RECURSED bit. >>> Without this bit the third line unlocks the lock and leaves it in a >>> unlocked state (when it should still be locked). Adding a line to set >>> this bit makes _rw_try_wlock match the code in _rw_wlock_hard. >> >> Sorry, but I really don't understand how that can be a bug. >> On STABLE_7, RW_LOCK_RECURSED is not used for checking if the lock is >> recursed or not. >> it just got set for improving debugging and eventually we decided to >> drop it for 8.0. >> >> However, for deciding if a lock is recursed or not in both STABLE_7 >> and HEAD we used just checking against the recursion count which is >> correctly handled by the function. >> >> What you describe can't be the problem. >> >> Attilio >> > > Ok, so I have had a better look at the code in CURRENT, and compared > it to the code in STABLE_7. I apologise but I think I mixed up my > sources somewhere and the problem does not appear in CURRENT. The > problem does however occur in STABLE_7, and I can explain that below. > > 1: rw_wlock(&rw); > 2: if ( rw_try_wlock(&rw) ) > 3: rw_wunlock(&rw); > 4: rw_wunlock(&rw); > > Line 1, _rw_wlock gets called which calls __rw_wlock. This in turn > calls _rw_write_lock which changes rw->rw_lock to tid > Line 2, _rw_try_wlock gets called, which check if the lock is already > held (which it is), if so it then rw->rw_recurse++ > Line 3, _rw_wunlock gets called which calls __rw_wunlock, which then > calls _rw_write_unlock. Now _rw_write_unlock trys to unlock the lock > by checking if rw->rw_lock is tid, if so it sets the lock to > RW_UNLOCKED. This is where the problem occurs, there is no check on > rx->rw_recurse. > > Now, if we used this code instead: > 1: rw_wlock(&rw); > 2: if ( rw_try_wlock(&rw) ) > 3: rw_wunlock(&rw); > 4: rw_wunlock(&rw); > > The order goes: > Line 1, _rw_wlock gets called which calls __rw_wlock. This in turn > calls _rw_write_lock which changes rw->rw_lock to tid > Line 2, _rw_wlock gets called which calls __rw_wlock. __rw_wlock > checks if rw->rw_lock is RW_UNLOCKED, otherwise it ends up calling > _rw_wlock_hard. Inside _rw_wlock_hard it checks if the lock is already > held, if so it increments rw_recurse, and changes rw->rw_lock to > (rw->rw_lock | RW_LOCK_RECURSED) by calling atomic_set_ptr. > Line 3, _rw_wunlock gets called which calls __rw_wunlock, which then > calls _rw_write_unlock. Now _rw_write_unlock trys to unlock the lock > by checking if rw->rw_lock is tid. This is where things differ from > the previous code, rw->rw_lock is NOT tid, it is now (tid | > RW_LOCK_RECURSED), and thus the code drops into _rw_wunlock_hard which > decrements rx->rw_recurse and removes the RW_LOCK_RECURSED flag if > rw_recurse equals zero. > > Hopefully you see how not setting the RW_LOCK_RECURSED flag causes a > problem in rw_wunlock. So this can be fixed by either back porting the > CURRENT code to STABLE_7, or using the one line fix in my patch. As I > already stated, I am using my patch and the sample code I have given > no longer panics. I see the problem now. I think the simple patch of your is good and I will commit it. Thanks, Attilio -- Peace can only be achieved by understanding - A. EinsteinReceived on Wed Aug 26 2009 - 22:34:12 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:54 UTC