Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]

From: Mark Millard <markmi_at_dsl-only.net>
Date: Sat, 18 Feb 2017 13:58:49 -0800
On 2017-Feb-18, at 12:58 PM, Mateusz Guzik <mjguzik at gmail.com> wrote:

> On Sat, Feb 18, 2017 at 12:49:29PM -0800, Mark Millard wrote:
>> On 2017-Feb-18, at 4:18 AM, Mark Millard <markmi at dsl-only.net> wrote:
>> 
>>> [Note: I experiment with clang based powerpc64 builds,
>>> reporting problems that I find. Justin is familiar
>>> with this, as is Nathan.]
>>> 
>>> I tried to update the PowerMac G5 (a so-called "Quad Core")
>>> that I have access to from head -r312761 to -r313864 and
>>> ended up with random panics and hang ups in fairly short
>>> order after booting.
>>> 
>>> Some approximate bisecting for the kernel lead to:
>>> (sometimes getting part way into a buildkernel attempt
>>> for a different version before a failure happens)
>>> 
>>> -r313266: works (just before use of atomic_fcmpset)
>>> vs.
>>> -r313271: fails (last of the "use atomic_fcmpset" check-ins)
>>> 
>>> (I did not try -r313268 through -r313270 as the use was
>>> gradually added.)
>>> 
>>> So I'm currently running a -r313864 world with a -r313266
>>> kernel.
>>> 
>>> No kernel that I tried that was from before -r313266 had the
>>> problems.
>>> 
>>> Any kernel that I tried that was from after -r313271 had the
>>> problems.
>>> 
>>> Of course I did not try them all in other direction. :)
>> 
>> [Of course: "either direction".]
>> 
>> I'll note that the -r313864 buildworld was without
>> MALLOC_PRODUCTION being defined. (Unusual for me but
>> I'm testing if a jemalloc assert problem on arm64
>> also happens on powerpc64.)
>> 
>> By contrast the buildkernels were production style
>> (as is normal for me unless I'm trying to track
>> something down that I think might be exposed by
>> the extra checks).
>> 
> 
> Well either the primitive itself is buggy or the somewhat (now) unusual
> condition of not providing the failed value (but possibly a stale one)
> is not handled correctly in locking code.
> 
> That said, I would start with putting barriers "on both sides" of
> powerpc's fcmpset for debugging purposes and if the problem persists I
> can add some debugs to locking priitmives.
> 
> -- 
> Mateusz Guzik <mjguzik gmail.com>

I currently have the only powerpc64 that I have access
to for now doing a test that will likely finish tonight
sometime (if it has no problems).

Also I'm not so familiar with powerpc64 details as to be
able insert proper barriers and the like off the top of
my head: It is more of a research subject for me.


Side note:

It looks like contexts like __rw_wlock_hard(c,v,tid,file,line)
now needs the caller to do an equivalent of:

__rw_wlock_hard(c,RW_READ_VALUE(rwlock2rw(c)),file,line)

in order for the code behavior to match the old behavior
that was based on the original local-v's initialization
before v was used:

rw = rwlock2rw(c);
v = RW_READ_VALUE(rw); /* this line no longer exists */

This means that checking for equivalence is no longer
local to the routine but involves checking all the
usage of the routine.

I've not done such so for all I know such usage is always
in place: This is not a claim of a problem. The other
routines in kern_rwlock.c still have local variables and
the original initializations. I just thought that this
was interesting. I've not looked at other files yet.

===
Mark Millard
markmi at dsl-only.net
Received on Sat Feb 18 2017 - 20:58:58 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:10 UTC