Re: smp_rendezvous_action: Are atomics correctly used ?

From: Alexandre Martins <alexandre.martins_at_stormshield.eu> Date: Fri, 10 Mar 2017 16:23:02 +0100 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:10 UTC

Le vendredi 10 mars 2017, 16:46:26 Konstantin Belousov a écrit :
> On Fri, Mar 10, 2017 at 03:30:21PM +0100, Alexandre Martins wrote:
> > Le vendredi 10 mars 2017, 15:57:16 Konstantin Belousov a ?crit :
> > > On Fri, Mar 10, 2017 at 02:24:52PM +0100, Alexandre Martins wrote:
> > > > Le jeudi 9 mars 2017, 16:25:17 Konstantin Belousov a ?crit :
> > > > > On Thu, Mar 09, 2017 at 02:52:09PM +0100, Alexandre Martins wrote:
> > > > > > Le jeudi 9 mars 2017, 15:07:54 Konstantin Belousov a ?crit :
> > > > > > > On Thu, Mar 09, 2017 at 10:59:27AM +0100, Alexandre Martins 
wrote:
> > > > > > > > I have the save question for the cpu_ipi_pending here:
> > > > > > > > 
> > > > > > > > https://svnweb.freebsd.org/base/head/sys/x86/x86/mp_x86.c?view
> > > > > > > > =ann
> > > > > > > > otat
> > > > > > > > e#l1
> > > > > > > > 080>
> > > > > > > > 
> > > > > > > > Le jeudi 9 mars 2017, 10:43:14 Alexandre Martins a ?crit :
> > > > > > > > > Hello,
> > > > > > > > > 
> > > > > > > > > I'm curently reading the code of the function
> > > > > > > > > smp_rendezvous_action,
> > > > > > > > > in
> > > > > > > > > kern/subr_smp.c file. In that function, i see that the
> > > > > > > > > variable
> > > > > > > > > smp_rv_waiters is read in some while() loop in a non-atomic
> > > > > > > > > way.
> > > > > > > > > 
> > > > > > > > > https://svnweb.freebsd.org/base/head/sys/kern/subr_smp.c?vie
> > > > > > > > > w=an
> > > > > > > > > nota
> > > > > > > > > te#l
> > > > > > > > > 412
> > > > > > > > > https://svnweb.freebsd.org/base/head/sys/kern/subr_smp.c?vie
> > > > > > > > > w=an
> > > > > > > > > nota
> > > > > > > > > te#l
> > > > > > > > > 458
> > > > > > > > > https://svnweb.freebsd.org/base/head/sys/kern/subr_smp.c?vie
> > > > > > > > > w=an
> > > > > > > > > nota
> > > > > > > > > te#l
> > > > > > > > > 472
> > > > > > > > > 
> > > > > > > > > I suspect one of my freeze to be due by that.
> > > > > > > 
> > > > > > > You should provide either evidence or, at least, some reasoning
> > > > > > > supporting
> > > > > > > your claims.
> > > > > > 
> > > > > > I curently have a software watchdog that triger and does a
> > > > > > coredump.
> > > > > > In
> > > > > > the
> > > > > > coredumps, I always see a CPU trying to write-lock a "rm lock".
> > > > > > Every
> > > > > > time,
> > > > > > that CPU is spinning into the smp_rendezvous_action, in the first
> > > > > > while
> > > > > > loop) while the others are into the idle threads.
> > > > > > 
> > > > > > The fact is that freeze is not clear and I start to search
> > > > > > "exotic"
> > > > > > causes
> > > > > > to explain it.
> > > > > 
> > > > > This sounds as the 'usual' deadlock, where some other thread owns
> > > > > rmlock
> > > > > in
> > > > > read mode.  I recommend you to follow the
> > > > > https://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handboo
> > > > > k/ke
> > > > > rnel debug-deadlocks.html
> > > > 
> > > > Just a last question, for my personnal knowledge.
> > > > 
> > > > In ARM >= 6, for atomic acces, the code should (?) use LDREX and STREX
> > > > for, I quote : "Use LDREX and STREX to implement interprocess
> > > > communication in multiple-processor and shared-memory systems." (see
> > > > here
> > > > 
> > > > http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0489e/C
> > > > ihbg
> > > > hef.html
> > > 
> > > In my previous response to you, I explicitely defined what 'atomic'
> > > means when adjected to the term 'load'. The *EX instructions are used on
> > > ll/sc architectures to implement read/modify/write atomic operations,
> > > which are different from load (read) operations.
> > 
> > Ok ! Because we just want to read the value, there is no need to use the
> > *EX version. *EX is intended to be use when a modification will be done
> > thereafter.> 
> > > > But, in that while loop, it's a standard "LDR" that is used. Is it
> > > > correct
> > > > too, and why ?
> > > 
> > > Which 'that while loop' ?
> > > 
> > > 	while (atomic_load_acq_int(&smp_rv_waiters[3]) < ncpus)
> > > 	
> > > 		cpu_spinwait();
> > > 
> > > This one ?
> > 
> > No, I point the one at line 412, 458 and 472:
> > 
> > 412: while (smp_rv_waiters[0] < smp_rv_ncpus)
> > 
> >          cpu_spinwait();
> > 
> > 458: while (smp_rv_waiters[1] < smp_rv_ncpus)
> > 
> >          cpu_spinwait();
> > 
> > 472: while (smp_rv_waiters[2] < smp_rv_ncpus)
> > 
> >          cpu_spinwait();
> > > 
> > > Because the semantic of the normal load + DMB barrier provides the
> > > expected
> > > semantic of atomic_load_acq(), as explained in atomic(9) and utilized by
> > > the author of the code.
> > 
> > So, the writer must use LDREX/STREX to modify the value and use dmb to
> > make
> > visible to other CPU the write.
> 
> No, this is false statement on all/many counts.  ll/sc is only needed for
> atomic modification, not for a write.  If you need to assign a given value
> to the variable, STR instruction does just that.  LDREX/STREX provide a
> way to ensure that a modification done atomically.  E.g., if your intent
> is to add 1 to the word in memory, you need to ensure that the memory
> is not modified, when writing out the modified read value.
> 
> Next, DMB does not 'make visible' the modification. DMB separates
> externally visible effects of executed instructions before and after it.
> From the whole guarantees provided by this separation, atomic_load_acq()
> only needs the effect of not allowing later memory accesses to occur
> earlier than the DMB instruction was executed (the acquire semantic).
> ARMv8 provides loads and stores with the reduced barriers to implement
> _acq/_rel without excess overhead of full barrier.
> 
> DMB does not make any store instruction more effective than it already is.

OK, that why I didn't understand well the use of atomics.

It's related to the function "atomic_load_xxx/atomic_store_xxx" that made me 
think that it's THIS store or THIS load is atomic, but no. The loads and 
stores are already atomic. Thoses functions just do a barrier (if needed) 
before for "acq" and after for the "rel". The barrier does not "flush" anything 
in memory but prevent loads and stores reordering.

I realy need to practice more the use of atomic _correctly_ (^_^)

> 
> > The readers can read simply the value without the barrier because cache
> > coherancy protcol will update the value automaticaly.
> 
> Same is true for stores.  This is why plain loads and stores are atomic.
> 
> > I think I finally got it !
> > Thank you so much !
> > 
> > Best regards,

-- 
Alexandre Martins
STORMSHIELD