On 2017-Feb-25, at 1:05 AM, Mark Millard <markmi_at_dsl-only.net> wrote: > On 2017-Feb-24, at 11:46 PM, Mark Millard <markmi at dsl-only.net> wrote: > >> On 2017-Feb-24, at 8:25 PM, Mark Millard <markmi at dsl-only.net> wrote: >> >>> On 2017-Feb-24, at 4:23 PM, Mateusz Guzik <mjguzik at gmail.com> wrote: >>>> >>>> On Tue, Feb 21, 2017 at 01:37:25AM -0800, Mark Millard wrote: >>>>> [Back to the powerpc64 context.] >>>>> >>>>> On 2017-Feb-20, at 11:10 AM, Mateusz Guzik <mjguzik at gmail.com> wrote: >>>>> >>>>>> On Sat, Feb 18, 2017 at 04:18:05AM -0800, Mark Millard wrote: >>>>>>> [Note: I experiment with clang based powerpc64 builds, >>>>>>> reporting problems that I find. Justin is familiar >>>>>>> with this, as is Nathan.] >>>>>>> >>>>>>> I tried to update the PowerMac G5 (a so-called "Quad Core") >>>>>>> that I have access to from head -r312761 to -r313864 and >>>>>>> ended up with random panics and hang ups in fairly short >>>>>>> order after booting. >>>>>>> >>>>>>> Some approximate bisecting for the kernel lead to: >>>>>>> (sometimes getting part way into a buildkernel attempt >>>>>>> for a different version before a failure happens) >>>>>>> >>>>>>> -r313266: works (just before use of atomic_fcmpset) >>>>>>> vs. >>>>>>> -r313271: fails (last of the "use atomic_fcmpset" check-ins) >>>>>>> >>>>>>> (I did not try -r313268 through -r313270 as the use was >>>>>>> gradually added.) >>>>>>> >>>>>>> So I'm currently running a -r313864 world with a -r313266 >>>>>>> kernel. >>>>>>> >>>>>>> No kernel that I tried that was from before -r313266 had the >>>>>>> problems. >>>>>>> >>>>>>> Any kernel that I tried that was from after -r313271 had the >>>>>>> problems. >>>>>>> >>>>>>> Of course I did not try them all in other direction. :) >>>>>>> >>>>>> >>>>>> I found that spin mutexes were not properly handling this, fixed in >>>>>> r313996. >>>>>> >>>>>> Locally I added a if (cpu_tick() % 2) return (0); snipped to amd64 >>>>>> fcmpset to simulate failures. Everything works, while it would easily >>>>>> fail without the patch. >>>>>> >>>>>> That said, I hope this concludes the 'missing check for not-reread value >>>>>> of failed fcmpset' saga. >>>>>> >>>>>> -- >>>>>> Mateusz Guzik <mjguzik gmail.com> >>>>> >>>>> -r313999 is an improvement for powerpc64: it boots and I can >>>>> log in on the old PowerMac G5 so-called "Quad Core". >>>>> >>>>> But, e.g., buildworld buildkernel eventually hangs and later >>>>> the powerpc64 panics for "spin lock held too long". >>>>> >>>> >>>> Allright, play time is over. >>>> >>>> Can you please: >>>> 1. verify r313254 is stable for you >>>> 2. apply https://people.freebsd.org/~mjg/patches/complete-locks.diff and >>>> https://people.freebsd.org/~mjg/.junk/ppc.diff on top of it and retry >>>> the test? >>>> >>>> This is a workaround which effectively disables the powerpc-specific >>>> primitive and makes it use a cmpset wrapper instead. I don't have the >>>> hardware to test right now and my attempts to boot in qemu also failed. >>>> >>>> That said, does not look like there are general fcmpset bugs left and >>>> the remaining issue seems powerpc-specific. >>>> >>>> If this works, I'll commit the workaround for the time being as in few >>>> weeks I'd like to start merging the work back to stable/11. >>>> >>>> -- >>>> Mateusz Guzik <mjguzik gmail.com> >>> >>> I've started a self-hosted powerpc64 -r313254 build >>> based on running the -r313266 kernel. (The context >>> sometimes do cross builds in is tied up with other >>> things. -r313266 is what my prior bisection came up >>> with as the last appearently-working kernel at the >>> time.) >>> >>> So it will be a while before I have a -r313254 in >>> place to try: the self-hosted build takes longer >>> and so will not be installed for a while. >>> >>> To judge stability I'll probably have -e313254 build >>> the patched update that you want me to test, initially >>> doing a cleanworld. So that too will take a while. >>> >>> (The above wording presumes all goes well.) >>> >>> I'll let you know as I go along if I run into anything >>> interesting. >>> >>> >>> My builds are rebuilding both world and kernel since >>> what turns into /usr/include/sys/* has changes in your >>> patch. >>> >>> The builds are without MALLOC_PRODUCTION but are >>> otherwise not debug builds. >>> >>> >>> I've not seen anything indicating that anyone has >>> been trying TARGET_ARCH=powerpc. I've been trying >>> TARGET_ARCH=powerpc64 . >>> >>> While I do not have access to a true >>> TARGET_ARCH=powerpc machine currently, such a build >>> can be used on a PowerMac G5 so-called "Quad Core". >>> So I could eventually build and try such on the one >>> powerpc family machine that I currently have access >>> to. >>> >>> clang 3.9.1 has a significant code generation problem >>> for TARGET_ARCH=powerpc and so I'd have to use >>> a gcc 4.2.1 based build for that sort of experiment. >>> (There is no xtoolchain for 32-bit powerpc.) >>> >>> I use clang 3.9.1 or xtoolchain for >>> TARGET_ARCH=powerpc64 and have been using clang 3.9.1 >>> in recent times. My primary powerpc family use has >>> been to experiment with building based on the >>> modern libc++ and reporting issues discovered in the >>> attempts. This explains the clang/xtoolchain context. >>> >>> clang 3.9.1 has major problems for C++ exception >>> handling for both powerpc64 and powerpc but a >>> lot of FreeBSD is independent of throwing C++ >>> exceptions. By contrast xtoolchain-based works >>> for C++ exception handling but lib32 fails >>> to operate when built by a xtoolchain build. >> >> -r313254 had no trouble booting or building >> the patched version or anything else involved >> in getting there or installing. >> >> But the patched version failed quickly just >> attempting cleanworld's recursive remove. (So >> it did boot and let me log in.) The panic >> description was: >> >> panic: vn_finished_secondary_write: neg cnt >> >> >> The sources that are different from svn's -r313254 >> are (some tied to arm64 experiments, most everything >> else tied to powerpc64 and/or powerpc, those not >> from your patches are long standing from my >> investigations or from Justin H.): >> >> # svnlite status /usr/src | sort >> . . . (ignoring the ? lines) . . . >> M /usr/src/bin/sh/jobs.c >> M /usr/src/bin/sh/miscbltin.c >> M /usr/src/contrib/llvm/lib/Target/PowerPC/PPCInstrInfo.td >> M /usr/src/contrib/llvm/tools/lld/ELF/Target.cpp >> M /usr/src/lib/csu/powerpc64/Makefile >> M /usr/src/libexec/rtld-elf/Makefile >> M /usr/src/sys/arm/arm/gic.c >> M /usr/src/sys/boot/ofw/Makefile.inc >> M /usr/src/sys/boot/powerpc/Makefile.inc >> M /usr/src/sys/boot/powerpc/kboot/Makefile >> M /usr/src/sys/boot/uboot/Makefile.inc >> M /usr/src/sys/conf/kmod.mk >> M /usr/src/sys/ddb/db_main.c >> M /usr/src/sys/ddb/db_script.c >> M /usr/src/sys/kern/init_main.c >> M /usr/src/sys/kern/kern_condvar.c >> M /usr/src/sys/kern/kern_lock.c >> M /usr/src/sys/kern/kern_lockstat.c >> M /usr/src/sys/kern/kern_mutex.c >> M /usr/src/sys/kern/kern_rwlock.c >> M /usr/src/sys/kern/kern_sx.c >> M /usr/src/sys/kern/kern_synch.c >> M /usr/src/sys/kern/kern_thread.c >> M /usr/src/sys/kern/subr_lock.c >> M /usr/src/sys/kern/vfs_default.c >> M /usr/src/sys/kern/vfs_subr.c >> M /usr/src/sys/powerpc/include/atomic.h >> M /usr/src/sys/powerpc/ofw/ofw_machdep.c >> M /usr/src/sys/sys/lock.h >> M /usr/src/sys/sys/lockmgr.h >> M /usr/src/sys/sys/lockstat.h >> M /usr/src/sys/sys/mutex.h >> M /usr/src/sys/sys/rwlock.h >> M /usr/src/sys/sys/sdt.h >> M /usr/src/sys/sys/sx.h >> M /usr/src/sys/sys/systm.h > > To recover from the problem and again have a buildworld > buildkernel present I've booted based on: > > A) The -r313254 kernel without your patches (kernel.old). > B) The -r313254 world (which had your patches in its > build). > > I've reverted the /usr/src/ to not have your patches > (but does have my prior ones from prior activity). > > I repeated the cleanworld to let it finish after its > prior failure (that failed during a SSD trim activity). > > I've started buildworld buildkernel (with -j 4 as is > normal for my context). > > So far this combination seems to be working fine. This > suggests that the sys/sys/*.h files that ended up in > /usr/include/sys/ and the sys/powerpc/include/atomic.h > that ended up in /usr/include/machine/ were not problems > as used in the world code --since those uses are still in > place in the binaries being used. Only the kernel > binaries seem to be a problem (not necessarily all of > them). Unfortunately it eventually got a panic for a Data Storage Interrupt. I may not be unable to do a self hosted build to get things back to normal. === Mark Millard markmi at dsl-only.netReceived on Sat Feb 25 2017 - 12:49:44 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:10 UTC