On 2017-Feb-18, at 12:58 PM, Mateusz Guzik <mjguzik at gmail.com> wrote: > On Sat, Feb 18, 2017 at 12:49:29PM -0800, Mark Millard wrote: >> On 2017-Feb-18, at 4:18 AM, Mark Millard <markmi at dsl-only.net> wrote: >> >>> [Note: I experiment with clang based powerpc64 builds, >>> reporting problems that I find. Justin is familiar >>> with this, as is Nathan.] >>> >>> I tried to update the PowerMac G5 (a so-called "Quad Core") >>> that I have access to from head -r312761 to -r313864 and >>> ended up with random panics and hang ups in fairly short >>> order after booting. >>> >>> Some approximate bisecting for the kernel lead to: >>> (sometimes getting part way into a buildkernel attempt >>> for a different version before a failure happens) >>> >>> -r313266: works (just before use of atomic_fcmpset) >>> vs. >>> -r313271: fails (last of the "use atomic_fcmpset" check-ins) >>> >>> (I did not try -r313268 through -r313270 as the use was >>> gradually added.) >>> >>> So I'm currently running a -r313864 world with a -r313266 >>> kernel. >>> >>> No kernel that I tried that was from before -r313266 had the >>> problems. >>> >>> Any kernel that I tried that was from after -r313271 had the >>> problems. >>> >>> Of course I did not try them all in other direction. :) >> >> [Of course: "either direction".] >> >> I'll note that the -r313864 buildworld was without >> MALLOC_PRODUCTION being defined. (Unusual for me but >> I'm testing if a jemalloc assert problem on arm64 >> also happens on powerpc64.) >> >> By contrast the buildkernels were production style >> (as is normal for me unless I'm trying to track >> something down that I think might be exposed by >> the extra checks). >> > > Well either the primitive itself is buggy or the somewhat (now) unusual > condition of not providing the failed value (but possibly a stale one) > is not handled correctly in locking code. > > That said, I would start with putting barriers "on both sides" of > powerpc's fcmpset for debugging purposes and if the problem persists I > can add some debugs to locking priitmives. > > -- > Mateusz Guzik <mjguzik gmail.com> I currently have the only powerpc64 that I have access to for now doing a test that will likely finish tonight sometime (if it has no problems). Also I'm not so familiar with powerpc64 details as to be able insert proper barriers and the like off the top of my head: It is more of a research subject for me. Side note: It looks like contexts like __rw_wlock_hard(c,v,tid,file,line) now needs the caller to do an equivalent of: __rw_wlock_hard(c,RW_READ_VALUE(rwlock2rw(c)),file,line) in order for the code behavior to match the old behavior that was based on the original local-v's initialization before v was used: rw = rwlock2rw(c); v = RW_READ_VALUE(rw); /* this line no longer exists */ This means that checking for equivalence is no longer local to the routine but involves checking all the usage of the routine. I've not done such so for all I know such usage is always in place: This is not a claim of a problem. The other routines in kern_rwlock.c still have local variables and the original initializations. I just thought that this was interesting. I've not looked at other files yet. === Mark Millard markmi at dsl-only.netReceived on Sat Feb 18 2017 - 20:58:58 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:10 UTC