Re: [patch] i386 pmap sysmaps_pcpu[] atomic access

From: Svatopluk Kraus <onwahe_at_gmail.com>
Date: Wed, 20 Feb 2013 13:31:08 +0100
On Tue, Feb 19, 2013 at 7:51 PM, Konstantin Belousov
<kostikbel_at_gmail.com> wrote:
> On Mon, Feb 18, 2013 at 11:18:16PM +0100, Svatopluk Kraus wrote:
>> On Mon, Feb 18, 2013 at 9:36 PM, Konstantin Belousov
>> <kostikbel_at_gmail.com> wrote:
>> Well, I'm taking a part on porting FreeBSD to ARM11mpcore. UP case was
>> simple. SMP case is more complex and rather new for me. Recently, I
>> was solving a problem with PCPU stuff. For example, PCPU_GET is
>> implemented by one instruction on i386 arch. So, a need of atomicity
>> with respect to interrupts can be overlooked. On load-store archs, the
>> implementation which works in SMP case is not so simple. And what
>> works in UP case (single PCPU), not works in SMP case. Believe me,
>> mysterious and sporadic 'mutex not owned' assertions and others ones
>> caused by curthreads mess, it takes a while ...
> Note that PCPU_GET() is not needed to be atomic. The reason is that the code
> which uses its result would not be atomic (single-instruction or whatever, see
> below). Thus, either the preemption should not matter since the action with
> the per-cpu data is advisory, as is in the case of i386 pmap you noted.
> or some external measures should be applied in advance to the containing
> region (which you proposed, by bracing with sched_pin()).

So, it's advisory in the case of i386 pmap... Well, if you can live
with that, I can too.

>
> Also, note that it is not interrupts but preemption which is concern.

Yes and no. In theory, yes, a preemption is a concern. In FreeBSD,
however, sched_pin() and critical_enter() and their counterparts are
implemented with help of curthread. And curthread definition falls to
PCPU_GET(curthread) if not defined in other way. So, curthread should
be atomic with respect to interrupts and in general, PCPU_GET() should
be too. Note that spinlock_enter() definitions often (always) use
curthread too. Anyhow, it's defined in MD code and can be defined for
each arch separately.

>>
>> After this, I took a look at how PCPU stuff is used in whole kernel
>> and found out mentioned here i386 pmap issue. So, my concern is
>> following:
>>
>> 1. to verify my newly gained ideas how per CPU data should be used,
>> 2. to decide how to change my implementation of pmap on ARM11mpcore,
>> as it's based on i386 pmap,
>> 3. to make FreeBSD code better.
>>
>> In the meanwhile, it looks that using data dedicated to one CPU on
>> another one is OK. However, I can't agree. At least, without comments,
>> it is misleading for anyone new in FreeBSD and makes code misty.
>>
>> > Both threads in your description make useful progress, and computation
>> > proceeds correctly.
>>
>> I thought, there is only one thread in my example. One thread running
>> on CPU1, but holding sysmaps dedicated to CPU0 instead of holding
>> sysmaps dedicated to CPU1. So, any thread running on CPU0 must wait
>> because the thread running on CPU1 is a thief. Futhermore, the idea
>> that a thread on CPU1 should hold data for CPU1 is not valid. So,
>> either some comment is missing in the code that it's OK or the using
>> of PCPU_GET(cpuid) is unneeded and some kind of free sysmaps list can
>> be used and it will serve better.
>
> What is wrong on almost all architectures except x86 is PCPU_ADD(), which
> is usually supposed by the MD code to be atomic in regard to _both_
> interrupts and preemption. I said almost all, but in fact I am not aware
> of any architecture except x86 where it is right.
>
> And, RISCs with the load-link and store-conditional (ll/sc) primitives
> are well capable of doing the PCPU_ADD() correctly.
>
> You could look at the projects/counters branch, where single-instruction
> increment is used on x86 for dynamic per-cpu counters, and critical_enter()
> for other architectures.  I might do some work for other arches, but the
> counters are correct but slower that it could be, on !x86.

Thanks to help me to settle my thoughts.

Svata
Received on Wed Feb 20 2013 - 11:31:10 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:35 UTC