On Sat, Feb 18, 2017 at 01:58:49PM -0800, Mark Millard wrote: > On 2017-Feb-18, at 12:58 PM, Mateusz Guzik <mjguzik at gmail.com> wrote: > > Well either the primitive itself is buggy or the somewhat (now) unusual > > condition of not providing the failed value (but possibly a stale one) > > is not handled correctly in locking code. > > > > That said, I would start with putting barriers "on both sides" of > > powerpc's fcmpset for debugging purposes and if the problem persists I > > can add some debugs to locking priitmives. > > > > I currently have the only powerpc64 that I have access > to for now doing a test that will likely finish tonight > sometime (if it has no problems). > > Also I'm not so familiar with powerpc64 details as to be > able insert proper barriers and the like off the top of > my head: It is more of a research subject for me. > This was a suggestion to jhibbits_at_. Looking at the code it is not hard to slap them in for testing purposes, or maybe there is an "obvious now that I look at it" braino in there, or maybe he has a better idea. Now that I wrote it I can get myself access to powerpc boxes. While I wont be able to run bsd on them, I can hack around in userapce and see. That's unless jhibbits_at_ steps in. I have no clue about ppc. > It looks like contexts like __rw_wlock_hard(c,v,tid,file,line) > now needs the caller to do an equivalent of: > > __rw_wlock_hard(c,RW_READ_VALUE(rwlock2rw(c)),file,line) > > in order for the code behavior to match the old behavior > that was based on the original local-v's initialization > before v was used: > > rw = rwlock2rw(c); > v = RW_READ_VALUE(rw); /* this line no longer exists */ > > This means that checking for equivalence is no longer > local to the routine but involves checking all the > usage of the routine. > Not reading the argument locally was the entire point of introducing fcmpset. Otherwise the 'v' argument would be a waste of time. Some primitives can attempt grabbing the lock and if they fail, we have the lock value to work with (e.g. check who owns the lock and see if they are running). In particular amd64 will give us the value it found. An explicit read requires whoever owns the cachelilne to lose the exclusive ownership and if the lock is contended (multiple cpus doing fcmpset), this makes the cachelilne ping-pong between cores. This destroys performance especially on systems with many cores and especially so with multiple numa nodes. Other primitives don't have inline variants. This concerns read-write locks which try to: retry: r = lock_value(lock); if (!locked(r)) { if (!cmpset(lock, r, r + ONE_READER)) goto retry; } That is, if multiple cpus try to get the lock for reading, one will fail and willl lbe forced to compute the new value to be set. The longer the time between attempts the more likely it is other core showed up trying to do the same thing with the same value, causing another failed attempt. So here there are no inlilnes so that the time is shorter and fcmpset alllows NOT reading the lock value explicitely - it was already provided by hardware. Note this is still significantly slower than it has to be in principle - the lock can 'blilndly increment by ONE_READER and see what happens', but that requires several changes and is a subject for another e-mail. I'm working on it though. -- Mateusz Guzik <mjguzik gmail.com>Received on Sat Feb 18 2017 - 21:59:46 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:10 UTC