Re: smp_rendezvous_action: Are atomics correctly used ?

From: Konstantin Belousov <kostikbel_at_gmail.com>
Date: Fri, 10 Mar 2017 17:32:55 +0200
On Fri, Mar 10, 2017 at 04:23:02PM +0100, Alexandre Martins wrote:
> Le vendredi 10 mars 2017, 16:46:26 Konstantin Belousov a ?crit :
> > On Fri, Mar 10, 2017 at 03:30:21PM +0100, Alexandre Martins wrote:
> > > Le vendredi 10 mars 2017, 15:57:16 Konstantin Belousov a ?crit :
> > > > On Fri, Mar 10, 2017 at 02:24:52PM +0100, Alexandre Martins wrote:
> > > > > Le jeudi 9 mars 2017, 16:25:17 Konstantin Belousov a ?crit :
> > > > > > On Thu, Mar 09, 2017 at 02:52:09PM +0100, Alexandre Martins wrote:
> > > > > > > Le jeudi 9 mars 2017, 15:07:54 Konstantin Belousov a ?crit :
> > > > > > > > On Thu, Mar 09, 2017 at 10:59:27AM +0100, Alexandre Martins 
> wrote:
> > > > > > > > > I have the save question for the cpu_ipi_pending here:
> > > > > > > > > 
> > > > > > > > > https://svnweb.freebsd.org/base/head/sys/x86/x86/mp_x86.c?view
> > > > > > > > > =ann
> > > > > > > > > otat
> > > > > > > > > e#l1
> > > > > > > > > 080>
> > > > > > > > > 
> > > > > > > > > Le jeudi 9 mars 2017, 10:43:14 Alexandre Martins a ?crit :
> > > > > > > > > > Hello,
> > > > > > > > > > 
> > > > > > > > > > I'm curently reading the code of the function
> > > > > > > > > > smp_rendezvous_action,
> > > > > > > > > > in
> > > > > > > > > > kern/subr_smp.c file. In that function, i see that the
> > > > > > > > > > variable
> > > > > > > > > > smp_rv_waiters is read in some while() loop in a non-atomic
> > > > > > > > > > way.
> > > > > > > > > > 
> > > > > > > > > > https://svnweb.freebsd.org/base/head/sys/kern/subr_smp.c?vie
> > > > > > > > > > w=an
> > > > > > > > > > nota
> > > > > > > > > > te#l
> > > > > > > > > > 412
> > > > > > > > > > https://svnweb.freebsd.org/base/head/sys/kern/subr_smp.c?vie
> > > > > > > > > > w=an
> > > > > > > > > > nota
> > > > > > > > > > te#l
> > > > > > > > > > 458
> > > > > > > > > > https://svnweb.freebsd.org/base/head/sys/kern/subr_smp.c?vie
> > > > > > > > > > w=an
> > > > > > > > > > nota
> > > > > > > > > > te#l
> > > > > > > > > > 472
> > > > > > > > > > 
> > > > > > > > > > I suspect one of my freeze to be due by that.
> > > > > > > > 
> > > > > > > > You should provide either evidence or, at least, some reasoning
> > > > > > > > supporting
> > > > > > > > your claims.
> > > > > > > 
> > > > > > > I curently have a software watchdog that triger and does a
> > > > > > > coredump.
> > > > > > > In
> > > > > > > the
> > > > > > > coredumps, I always see a CPU trying to write-lock a "rm lock".
> > > > > > > Every
> > > > > > > time,
> > > > > > > that CPU is spinning into the smp_rendezvous_action, in the first
> > > > > > > while
> > > > > > > loop) while the others are into the idle threads.
> > > > > > > 
> > > > > > > The fact is that freeze is not clear and I start to search
> > > > > > > "exotic"
> > > > > > > causes
> > > > > > > to explain it.
> > > > > > 
> > > > > > This sounds as the 'usual' deadlock, where some other thread owns
> > > > > > rmlock
> > > > > > in
> > > > > > read mode.  I recommend you to follow the
> > > > > > https://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handboo
> > > > > > k/ke
> > > > > > rnel debug-deadlocks.html
> > > > > 
> > > > > Just a last question, for my personnal knowledge.
> > > > > 
> > > > > In ARM >= 6, for atomic acces, the code should (?) use LDREX and STREX
> > > > > for, I quote : "Use LDREX and STREX to implement interprocess
> > > > > communication in multiple-processor and shared-memory systems." (see
> > > > > here
> > > > > 
> > > > > http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0489e/C
> > > > > ihbg
> > > > > hef.html
> > > > 
> > > > In my previous response to you, I explicitely defined what 'atomic'
> > > > means when adjected to the term 'load'. The *EX instructions are used on
> > > > ll/sc architectures to implement read/modify/write atomic operations,
> > > > which are different from load (read) operations.
> > > 
> > > Ok ! Because we just want to read the value, there is no need to use the
> > > *EX version. *EX is intended to be use when a modification will be done
> > > thereafter.> 
> > > > > But, in that while loop, it's a standard "LDR" that is used. Is it
> > > > > correct
> > > > > too, and why ?
> > > > 
> > > > Which 'that while loop' ?
> > > > 
> > > > 	while (atomic_load_acq_int(&smp_rv_waiters[3]) < ncpus)
> > > > 	
> > > > 		cpu_spinwait();
> > > > 
> > > > This one ?
> > > 
> > > No, I point the one at line 412, 458 and 472:
> > > 
> > > 412: while (smp_rv_waiters[0] < smp_rv_ncpus)
> > > 
> > >          cpu_spinwait();
> > > 
> > > 458: while (smp_rv_waiters[1] < smp_rv_ncpus)
> > > 
> > >          cpu_spinwait();
> > > 
> > > 472: while (smp_rv_waiters[2] < smp_rv_ncpus)
> > > 
> > >          cpu_spinwait();
> > > > 
> > > > Because the semantic of the normal load + DMB barrier provides the
> > > > expected
> > > > semantic of atomic_load_acq(), as explained in atomic(9) and utilized by
> > > > the author of the code.
> > > 
> > > So, the writer must use LDREX/STREX to modify the value and use dmb to
> > > make
> > > visible to other CPU the write.
> > 
> > No, this is false statement on all/many counts.  ll/sc is only needed for
> > atomic modification, not for a write.  If you need to assign a given value
> > to the variable, STR instruction does just that.  LDREX/STREX provide a
> > way to ensure that a modification done atomically.  E.g., if your intent
> > is to add 1 to the word in memory, you need to ensure that the memory
> > is not modified, when writing out the modified read value.
> > 
> > Next, DMB does not 'make visible' the modification. DMB separates
> > externally visible effects of executed instructions before and after it.
> > From the whole guarantees provided by this separation, atomic_load_acq()
> > only needs the effect of not allowing later memory accesses to occur
> > earlier than the DMB instruction was executed (the acquire semantic).
> > ARMv8 provides loads and stores with the reduced barriers to implement
> > _acq/_rel without excess overhead of full barrier.
> > 
> > DMB does not make any store instruction more effective than it already is.
> 
> OK, that why I didn't understand well the use of atomics.
> 
> It's related to the function "atomic_load_xxx/atomic_store_xxx" that made me 
> think that it's THIS store or THIS load is atomic, but no. The loads and 
> stores are already atomic. Thoses functions just do a barrier (if needed) 
> before for "acq" and after for the "rel". The barrier does not "flush" anything 
> in memory but prevent loads and stores reordering.

Yes, this is true for the FreeBSD' kernel and C runtime environment.

Our model is very close to the C 2011 / C++ 2014 standard's models,
but of course it is not formalized even on the level of the mentioned
standards. One important difference is that the standards only require
explicitely atomic accesses to be atomic in the above sense, which
is why e.g. atomic_load_explicit(memory_order_relaxed) exists in the
standard, but is not duplicated by our atomic.h, and which might be the
source of your confusion.

> 
> I realy need to practice more the use of atomic _correctly_ (^_^)
> 
> > 
> > > The readers can read simply the value without the barrier because cache
> > > coherancy protcol will update the value automaticaly.
> > 
> > Same is true for stores.  This is why plain loads and stores are atomic.
> > 
> > > I think I finally got it !
> > > Thank you so much !
> > > 
> > > Best regards,
> 
> -- 
> Alexandre Martins
> STORMSHIELD
> 
Received on Fri Mar 10 2017 - 14:33:02 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:10 UTC