Re: [patch] i386 pmap sysmaps_pcpu[] atomic access

From: Konstantin Belousov <kostikbel_at_gmail.com> Date: Tue, 19 Feb 2013 20:51:29 +0200 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:35 UTC

On Mon, Feb 18, 2013 at 11:18:16PM +0100, Svatopluk Kraus wrote:
> On Mon, Feb 18, 2013 at 9:36 PM, Konstantin Belousov
> <kostikbel_at_gmail.com> wrote:
> Well, I'm taking a part on porting FreeBSD to ARM11mpcore. UP case was
> simple. SMP case is more complex and rather new for me. Recently, I
> was solving a problem with PCPU stuff. For example, PCPU_GET is
> implemented by one instruction on i386 arch. So, a need of atomicity
> with respect to interrupts can be overlooked. On load-store archs, the
> implementation which works in SMP case is not so simple. And what
> works in UP case (single PCPU), not works in SMP case. Believe me,
> mysterious and sporadic 'mutex not owned' assertions and others ones
> caused by curthreads mess, it takes a while ...
Note that PCPU_GET() is not needed to be atomic. The reason is that the code
which uses its result would not be atomic (single-instruction or whatever, see
below). Thus, either the preemption should not matter since the action with
the per-cpu data is advisory, as is in the case of i386 pmap you noted.
or some external measures should be applied in advance to the containing
region (which you proposed, by bracing with sched_pin()).

Also, note that it is not interrupts but preemption which is concern.

> 
> After this, I took a look at how PCPU stuff is used in whole kernel
> and found out mentioned here i386 pmap issue. So, my concern is
> following:
> 
> 1. to verify my newly gained ideas how per CPU data should be used,
> 2. to decide how to change my implementation of pmap on ARM11mpcore,
> as it's based on i386 pmap,
> 3. to make FreeBSD code better.
> 
> In the meanwhile, it looks that using data dedicated to one CPU on
> another one is OK. However, I can't agree. At least, without comments,
> it is misleading for anyone new in FreeBSD and makes code misty.
> 
> > Both threads in your description make useful progress, and computation
> > proceeds correctly.
> 
> I thought, there is only one thread in my example. One thread running
> on CPU1, but holding sysmaps dedicated to CPU0 instead of holding
> sysmaps dedicated to CPU1. So, any thread running on CPU0 must wait
> because the thread running on CPU1 is a thief. Futhermore, the idea
> that a thread on CPU1 should hold data for CPU1 is not valid. So,
> either some comment is missing in the code that it's OK or the using
> of PCPU_GET(cpuid) is unneeded and some kind of free sysmaps list can
> be used and it will serve better.

What is wrong on almost all architectures except x86 is PCPU_ADD(), which
is usually supposed by the MD code to be atomic in regard to _both_
interrupts and preemption. I said almost all, but in fact I am not aware
of any architecture except x86 where it is right.

And, RISCs with the load-link and store-conditional (ll/sc) primitives
are well capable of doing the PCPU_ADD() correctly.

You could look at the projects/counters branch, where single-instruction
increment is used on x86 for dynamic per-cpu counters, and critical_enter()
for other architectures.  I might do some work for other arches, but the
counters are correct but slower that it could be, on !x86.