Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]

From: Mark Millard <markmi_at_dsl-only.net>
Date: Sat, 25 Feb 2017 05:49:40 -0800
On 2017-Feb-25, at 1:05 AM, Mark Millard <markmi_at_dsl-only.net> wrote:

> On 2017-Feb-24, at 11:46 PM, Mark Millard <markmi at dsl-only.net> wrote:
> 
>> On 2017-Feb-24, at 8:25 PM, Mark Millard <markmi at dsl-only.net> wrote:
>> 
>>> On 2017-Feb-24, at 4:23 PM, Mateusz Guzik <mjguzik at gmail.com> wrote:
>>>> 
>>>> On Tue, Feb 21, 2017 at 01:37:25AM -0800, Mark Millard wrote:
>>>>> [Back to the powerpc64 context.]
>>>>> 
>>>>> On 2017-Feb-20, at 11:10 AM, Mateusz Guzik <mjguzik at gmail.com> wrote:
>>>>> 
>>>>>> On Sat, Feb 18, 2017 at 04:18:05AM -0800, Mark Millard wrote:
>>>>>>> [Note: I experiment with clang based powerpc64 builds,
>>>>>>> reporting problems that I find. Justin is familiar
>>>>>>> with this, as is Nathan.]
>>>>>>> 
>>>>>>> I tried to update the PowerMac G5 (a so-called "Quad Core")
>>>>>>> that I have access to from head -r312761 to -r313864 and
>>>>>>> ended up with random panics and hang ups in fairly short
>>>>>>> order after booting.
>>>>>>> 
>>>>>>> Some approximate bisecting for the kernel lead to:
>>>>>>> (sometimes getting part way into a buildkernel attempt
>>>>>>> for a different version before a failure happens)
>>>>>>> 
>>>>>>> -r313266: works (just before use of atomic_fcmpset)
>>>>>>> vs.
>>>>>>> -r313271: fails (last of the "use atomic_fcmpset" check-ins)
>>>>>>> 
>>>>>>> (I did not try -r313268 through -r313270 as the use was
>>>>>>> gradually added.)
>>>>>>> 
>>>>>>> So I'm currently running a -r313864 world with a -r313266
>>>>>>> kernel.
>>>>>>> 
>>>>>>> No kernel that I tried that was from before -r313266 had the
>>>>>>> problems.
>>>>>>> 
>>>>>>> Any kernel that I tried that was from after -r313271 had the
>>>>>>> problems.
>>>>>>> 
>>>>>>> Of course I did not try them all in other direction. :)
>>>>>>> 
>>>>>> 
>>>>>> I found that spin mutexes were not properly handling this, fixed in
>>>>>> r313996.
>>>>>> 
>>>>>> Locally I added a if (cpu_tick() % 2) return (0); snipped to amd64
>>>>>> fcmpset to simulate failures. Everything works, while it would easily
>>>>>> fail without the patch.
>>>>>> 
>>>>>> That said, I hope this concludes the 'missing check for not-reread value
>>>>>> of failed fcmpset' saga.
>>>>>> 
>>>>>> -- 
>>>>>> Mateusz Guzik <mjguzik gmail.com>
>>>>> 
>>>>> -r313999 is an improvement for powerpc64: it boots and I can
>>>>> log in on the old PowerMac G5 so-called "Quad Core".
>>>>> 
>>>>> But, e.g., buildworld buildkernel eventually hangs and later
>>>>> the powerpc64 panics for "spin lock held too long".
>>>>> 
>>>> 
>>>> Allright, play time is over.
>>>> 
>>>> Can you please:
>>>> 1. verify r313254 is stable for you
>>>> 2. apply https://people.freebsd.org/~mjg/patches/complete-locks.diff and
>>>> https://people.freebsd.org/~mjg/.junk/ppc.diff on top of it and retry
>>>> the test?
>>>> 
>>>> This is a workaround which effectively disables the powerpc-specific
>>>> primitive and makes it use a cmpset wrapper instead. I don't have the
>>>> hardware to test right now and my attempts to boot in qemu also failed.
>>>> 
>>>> That said, does not look like there are general fcmpset bugs left and
>>>> the remaining issue seems powerpc-specific.
>>>> 
>>>> If this works, I'll commit the workaround for the time being as in few
>>>> weeks I'd like to start merging the work back to stable/11.
>>>> 
>>>> -- 
>>>> Mateusz Guzik <mjguzik gmail.com>
>>> 
>>> I've started a self-hosted powerpc64 -r313254 build
>>> based on running the -r313266 kernel. (The context 
>>> sometimes do cross builds in is tied up with other
>>> things. -r313266 is what my prior bisection came up
>>> with as the last appearently-working kernel at the
>>> time.)
>>> 
>>> So it will be a while before I have a -r313254 in
>>> place to try: the self-hosted build takes longer
>>> and so will not be installed for a while.
>>> 
>>> To judge stability I'll probably have -e313254 build
>>> the patched update that you want me to test, initially
>>> doing a cleanworld. So that too will take a while.
>>> 
>>> (The above wording presumes all goes well.)
>>> 
>>> I'll let you know as I go along if I run into anything
>>> interesting.
>>> 
>>> 
>>> My builds are rebuilding both world and kernel since
>>> what turns into /usr/include/sys/* has changes in your
>>> patch.
>>> 
>>> The builds are without MALLOC_PRODUCTION but are
>>> otherwise not debug builds.
>>> 
>>> 
>>> I've not seen anything indicating that anyone has
>>> been trying TARGET_ARCH=powerpc. I've been trying
>>> TARGET_ARCH=powerpc64 .
>>> 
>>> While I do not have access to a true
>>> TARGET_ARCH=powerpc machine currently, such a build
>>> can be used on a PowerMac G5 so-called "Quad Core".
>>> So I could eventually build and try such on the one
>>> powerpc family machine that I currently have access
>>> to.
>>> 
>>> clang 3.9.1 has a significant code generation problem
>>> for TARGET_ARCH=powerpc and so I'd have to use
>>> a gcc 4.2.1 based build for that sort of experiment.
>>> (There is no xtoolchain for 32-bit powerpc.)
>>> 
>>> I use clang 3.9.1 or xtoolchain for
>>> TARGET_ARCH=powerpc64 and have been using clang 3.9.1
>>> in recent times. My primary powerpc family use has
>>> been to experiment with building based on the
>>> modern libc++ and reporting issues discovered in the
>>> attempts. This explains the clang/xtoolchain context.
>>> 
>>> clang 3.9.1 has major problems for C++ exception
>>> handling for both powerpc64 and powerpc but a
>>> lot of FreeBSD is independent of throwing C++
>>> exceptions. By contrast xtoolchain-based works
>>> for C++ exception handling but lib32 fails
>>> to operate when built by a xtoolchain build.
>> 
>> -r313254 had no trouble booting or building
>> the patched version or anything else involved
>> in getting there or installing.
>> 
>> But the patched version failed quickly just
>> attempting cleanworld's recursive remove. (So
>> it did boot and let me log in.) The panic
>> description was:
>> 
>> panic: vn_finished_secondary_write: neg cnt
>> 
>> 
>> The sources that are different from svn's -r313254
>> are (some tied to arm64 experiments, most everything
>> else tied to powerpc64 and/or powerpc, those not
>> from your patches are long standing from my
>> investigations or from Justin H.):
>> 
>> # svnlite status /usr/src | sort
>> . . . (ignoring the ? lines) . . .
>> M       /usr/src/bin/sh/jobs.c
>> M       /usr/src/bin/sh/miscbltin.c
>> M       /usr/src/contrib/llvm/lib/Target/PowerPC/PPCInstrInfo.td
>> M       /usr/src/contrib/llvm/tools/lld/ELF/Target.cpp
>> M       /usr/src/lib/csu/powerpc64/Makefile
>> M       /usr/src/libexec/rtld-elf/Makefile
>> M       /usr/src/sys/arm/arm/gic.c
>> M       /usr/src/sys/boot/ofw/Makefile.inc
>> M       /usr/src/sys/boot/powerpc/Makefile.inc
>> M       /usr/src/sys/boot/powerpc/kboot/Makefile
>> M       /usr/src/sys/boot/uboot/Makefile.inc
>> M       /usr/src/sys/conf/kmod.mk
>> M       /usr/src/sys/ddb/db_main.c
>> M       /usr/src/sys/ddb/db_script.c
>> M       /usr/src/sys/kern/init_main.c
>> M       /usr/src/sys/kern/kern_condvar.c
>> M       /usr/src/sys/kern/kern_lock.c
>> M       /usr/src/sys/kern/kern_lockstat.c
>> M       /usr/src/sys/kern/kern_mutex.c
>> M       /usr/src/sys/kern/kern_rwlock.c
>> M       /usr/src/sys/kern/kern_sx.c
>> M       /usr/src/sys/kern/kern_synch.c
>> M       /usr/src/sys/kern/kern_thread.c
>> M       /usr/src/sys/kern/subr_lock.c
>> M       /usr/src/sys/kern/vfs_default.c
>> M       /usr/src/sys/kern/vfs_subr.c
>> M       /usr/src/sys/powerpc/include/atomic.h
>> M       /usr/src/sys/powerpc/ofw/ofw_machdep.c
>> M       /usr/src/sys/sys/lock.h
>> M       /usr/src/sys/sys/lockmgr.h
>> M       /usr/src/sys/sys/lockstat.h
>> M       /usr/src/sys/sys/mutex.h
>> M       /usr/src/sys/sys/rwlock.h
>> M       /usr/src/sys/sys/sdt.h
>> M       /usr/src/sys/sys/sx.h
>> M       /usr/src/sys/sys/systm.h
> 
> To recover from the problem and again have a buildworld
> buildkernel present I've booted based on:
> 
> A) The -r313254 kernel without your patches (kernel.old).
> B) The -r313254 world (which had your patches in its
>   build).
> 
> I've reverted the /usr/src/ to not have your patches
> (but does have my prior ones from prior activity).
> 
> I repeated the cleanworld to let it finish after its
> prior failure (that failed during a SSD trim activity).
> 
> I've started buildworld buildkernel (with -j 4 as is
> normal for my context).
> 
> So far this combination seems to be working fine. This
> suggests that the sys/sys/*.h files that ended up in
> /usr/include/sys/ and the sys/powerpc/include/atomic.h
> that ended up in /usr/include/machine/ were not problems
> as used in the world code --since those uses are still in
> place in the binaries being used. Only the kernel
> binaries seem to be a problem (not necessarily all of
> them).

Unfortunately it eventually got a panic for a Data Storage
Interrupt.

I may not be unable to do a self hosted build to get things
back to normal. 

===
Mark Millard
markmi at dsl-only.net
Received on Sat Feb 25 2017 - 12:49:44 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:10 UTC