Re: kqueue LOR

From: Attilio Rao <attilio_at_freebsd.org>
Date: Fri, 15 Dec 2006 18:37:45 +0100
2006/12/13, John Baldwin <jhb_at_freebsd.org>:
> On Tuesday 12 December 2006 21:48, Bruce Evans wrote:
> > > Memory barriers just specify ordering, they don't ensure a cache flush so
> > > another CPU reads up to date values.  You can use memory barriers in
> > > conjunction with atomic operations on a variable to ensure that you can
> > > safely read other variables (which is what locks do).  For example, in
> this
> >
> > I thought that the acquire/release variants of atomic ops guarantee
> > this.  They seem to be documented to do this, while mutexes don't seem
> > to be documented to do this.  The MI (?) implementation of mutexes
> > depends on atomic_cmpset_{acq,rel}_ptr() doing this.
>
> The acq/rel just specify ordering.  As Attilio mentioned, we assume that the
> atomic_cmpset() that sets the contested flag will fail while racing with
> another CPU (even if the CPU can't see the new value, as long as it fails and
> keeps spinning mutexes will still work).  The 'rel' barrier on CPU A when
> releasing a lock forces all the other writes to be posted (and eventually
> become "visible") to other CPUs before the write that releases the lock.
> The 'acq' barrier on CPU B when acquiring the lock forces the CPU to not
> reorder any reads before it acquires the lock, so this makes you not read any
> data until you have the lock.  Thus, once CPU B has waited long enough
> to "see" the write from A to release the lock, we know that 1) it can
> also "see" all the other writes from that CPU that the lock protected, and 2)
> B hasn't tried to read any of them yet so it shouldn't have any stale values
> in registers.  None of this requires the OS to do a cache flush.  (If you
> have an SMP system where the cache can still hold stale values after another
> CPU updates values in memory where it is "visible" to the CPU acquiring the
> lock, then _acq might need to flush the cache, but that would be a property
> of that architecture.  However, even that would not require cache flushes so
> long as the stale values were evicted from the cache such that they honor the
> memory barrier and you don't see the new value of the lock until you see the
> new values of the earlier data.)

just a note: you can have a pratical overview of this with rwlocks
implementation (and with the difference between the exclusive case and
the shared case).

When we have a 'write' case we just apply the mutex semantic (with
very few implementation changes details), so we have to use and
acquiring memory barrier for locking and a rel memory barrier for
unlocking.
The thing is very different when acquiring a rwlock into the 'read' case.
Even if it is simple to understand that an acq semantic is needed for
locking (due to the fact that we might force the CPU to not read any
stale value before to acquire the lock), it is also simple to
understand that, since no (protected) value is updated due to the
semantic of read lock, there is no real necessity to use a release (or
whatelse) memory barrier (since that seriality would be assured from
the next thread acq barrier again).

In a more pratical manner, since we can use it in this way:

struct foo_softc {
        uint32_t sc_flags;
        uint32_t sc_gooo;
        struct rwlock sc_lock;
};

..
uint32_t fs;

rw_rlock(&sc->sc_lock);
fs = sc->sc_flags;
rw_runlock(&sc->sc_lock);

since no writing of the members of struct foo_softc is previewed it is
safe to only use an atomic instruction more than a memory barrier.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
Received on Fri Dec 15 2006 - 16:37:54 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:03 UTC