Re: [patch] rw_try_wlock does not set recursive bit when recursing

From: Attilio Rao <attilio_at_freebsd.org> Date: Thu, 27 Aug 2009 02:34:10 +0200 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:54 UTC

2009/8/27 Andrew Brampton <brampton+freebsd_at_gmail.com>:
> 2009/8/26 Attilio Rao <attilio_at_freebsd.org>:
>> 2009/8/27 Andrew Brampton <brampton+freebsd_at_gmail.com>:
>>> Hi,
>>> The following sequence of commands fails on line 4 with an assertion
>>> that the lock is not currently held:
>>>
>>> 1: rw_wlock(&rw);
>>> 2: if ( rw_try_wlock(&rw) )
>>> 3:    rw_wunlock(&rw);
>>> 4: rw_wunlock(&rw);
>>>
>>> This is because after line 3 is executed the rw lock is no longer
>>> held. I tracked this bug down to _rw_try_wlock which correctly
>>> increments rw_recurse, but does not set the RW_LOCK_RECURSED bit.
>>> Without this bit the third line unlocks the lock and leaves it in a
>>> unlocked state (when it should still be locked). Adding a line to set
>>> this bit makes _rw_try_wlock match the code in _rw_wlock_hard.
>>
>> Sorry, but I really don't understand how that can be a bug.
>> On STABLE_7, RW_LOCK_RECURSED is not used for checking if the lock is
>> recursed or not.
>> it just got set for improving debugging and eventually we decided to
>> drop it for 8.0.
>>
>> However, for deciding if a lock is recursed or not in both STABLE_7
>> and HEAD we used just checking against the recursion count which is
>> correctly handled by the function.
>>
>> What you describe can't be the problem.
>>
>> Attilio
>>
>
> Ok, so I have had a better look at the code in CURRENT, and compared
> it to the code in STABLE_7. I apologise but I think I mixed up my
> sources somewhere and the problem does not appear in CURRENT. The
> problem does however occur in STABLE_7, and I can explain that below.
>
> 1: rw_wlock(&rw);
> 2: if ( rw_try_wlock(&rw) )
> 3:    rw_wunlock(&rw);
> 4: rw_wunlock(&rw);
>
> Line 1, _rw_wlock gets called which calls __rw_wlock. This in turn
> calls _rw_write_lock which changes rw->rw_lock to tid
> Line 2, _rw_try_wlock gets called, which check if the lock is already
> held (which it is), if so it then rw->rw_recurse++
> Line 3, _rw_wunlock gets called which calls __rw_wunlock, which then
> calls _rw_write_unlock. Now _rw_write_unlock trys to unlock the lock
> by checking if rw->rw_lock is tid, if so it sets the lock to
> RW_UNLOCKED. This is where the problem occurs, there is no check on
> rx->rw_recurse.
>
> Now, if we used this code instead:
> 1: rw_wlock(&rw);
> 2: if ( rw_try_wlock(&rw) )
> 3:    rw_wunlock(&rw);
> 4: rw_wunlock(&rw);
>
> The order goes:
> Line 1, _rw_wlock gets called which calls __rw_wlock. This in turn
> calls _rw_write_lock which changes rw->rw_lock to tid
> Line 2, _rw_wlock gets called which calls __rw_wlock. __rw_wlock
> checks if rw->rw_lock is RW_UNLOCKED, otherwise it ends up calling
> _rw_wlock_hard. Inside _rw_wlock_hard it checks if the lock is already
> held, if so it increments rw_recurse, and changes rw->rw_lock to
> (rw->rw_lock | RW_LOCK_RECURSED) by calling atomic_set_ptr.
> Line 3, _rw_wunlock gets called which calls __rw_wunlock, which then
> calls _rw_write_unlock. Now _rw_write_unlock trys to unlock the lock
> by checking if rw->rw_lock is tid. This is where things differ from
> the previous code, rw->rw_lock is NOT tid, it is now (tid |
> RW_LOCK_RECURSED), and thus the code drops into _rw_wunlock_hard which
> decrements rx->rw_recurse and removes the RW_LOCK_RECURSED flag if
> rw_recurse equals zero.
>
> Hopefully you see how not setting the RW_LOCK_RECURSED flag causes a
> problem in rw_wunlock. So this can be fixed by either back porting the
> CURRENT code to STABLE_7, or using the one line fix in my patch. As I
> already stated, I am using my patch and the sample code I have given
> no longer panics.

I see the problem now.
I think the simple patch of your is good and I will commit it.

Thanks,
Attilio

-- 
Peace can only be achieved by understanding - A. Einstein