Re: head -r338804 boots threadripper 1950X fine; head -r338810+ do not; -r338807 seems implicated

From: Warner Losh <imp_at_bsdimp.com>
Date: Sun, 21 Oct 2018 21:28:51 -0600
On Sun, Oct 21, 2018 at 8:57 PM Mark Millard via freebsd-stable <
freebsd-stable_at_freebsd.org> wrote:

> [I built based on WITHOUT_ZFS= for other reasons. But,
> after installing the build, Hyper-V based boots are
> working.]
>
> On 2018-Oct-20, at 2:09 AM, Mark Millard <marklmi at yahoo.com> wrote:
>
> > On 2018-Oct-20, at 1:39 AM, Mark Millard <marklmi at yahoo.com> wrote:
> >
> >> I attempted to jump from head -r334014 to -r339076
> >> on a threadripper 1950X board and the boot fails.
> >> This is both native booting and under Hyper-V,
> >> same machine and root file system in both cases.
> >
> > I did my investigation under Hyper-V after seeing
> > a boot failure native.
> >
> > Looks like the native failure is even earlier,
> > before db> is even possible, possibly during
> > early loader activity.
> >
> > So this report is really for running under
> > Hyper-V: -r338804 boots and -r338810 does
> > not. By contrast -r334804 does not boot native.
> > (But I've little information for that context.)
> >
> > Sorry for the confusion. I rushed the report
> > in hopes of getting to sleep. It was not to be.
> >
> >> It fails just after the FreeBSD/SMP lines,
> >> reporting "kernel trap 9 with interrupts disabled".
> >>
> >> It fails in pmap_force_invaldiate_cache_range at
> >> a clflusl (%rax) instruction that produces a
> >> "Fatal trap 9: general protection fault while
> >> in kernel mode". cpudid=0 apic id= 00
> >>
> >> I used kernel.txz files from:
> >>
> >> https://artifact.ci.freebsd.org/snapshot/head/r*/amd64/amd64/
> >>
> >> to narrow the range of kernel builds for working -> failing
> >> and got:
> >>
> >> -r338804 boots fine
> >> (no amd64 kernel builds between to try)
> >> -r338810+ fails (any that I tried, anyway)
> >>
> >> In that range is -r338807 :
> >>
> >> QUOTE
> >> Author: kib
> >> Date: Wed Sep 19 19:35:02 2018
> >> New Revision: 338807
> >> URL:
> >> https://svnweb.freebsd.org/changeset/base/338807
> >>
> >>
> >> Log:
> >> Convert x86 cache invalidation functions to ifuncs.
> >>
> >> This simplifies the runtime logic and reduces the number of
> >> runtime-constant branches.
> >>
> >> Reviewed by: alc, markj
> >> Sponsored by:        The FreeBSD Foundation
> >> Approved by: re (gjb)
> >> Differential revision:
> >> https://reviews.freebsd.org/D16736
> >>
> >> Modified:
> >> head/sys/amd64/amd64/pmap.c
> >> head/sys/amd64/include/pmap.h
> >> head/sys/dev/drm2/drm_os_freebsd.c
> >> head/sys/dev/drm2/i915/intel_ringbuffer.c
> >> head/sys/i386/i386/pmap.c
> >> head/sys/i386/i386/vm_machdep.c
> >> head/sys/i386/include/pmap.h
> >> head/sys/x86/iommu/intel_utils.c
> >> END QUOTE
> >>
> >> There do seem to be changes associated with
> >> clflush(...) use. Looking at:
> >>
> >>
> https://svnweb.freebsd.org/base/head/sys/amd64/amd64/pmap.c?annotate=339432
> >>
> >> it appears that pmap_force_invalidate_cache_range has not
> >> changed since -r338807.
> >>
> >> It seems that -r338806 and -r3388810 would be unlikely
> >> contributors.
> >
>
> I went after my native-boot loader problem first because I
> could switch kernels via the loader for booting FreeBSD under
> Hyper-V. Switching loaders is more of a problem.
>
> In order to avoid the loader-time crash I switched to building
> installing based on WITHOUT_ZFS= . I've had no active use of
> ZFS in years. (The old official-build loaders that worked were
> non-ZFS ones.)
>
> This took care of the native-boot loader-crash --and, to my
> surprise, also the Hyper-V-boot kernel-time crash.
>
> My private builds now boot the 1950X in both contexts just
> fine.
>
> During my early investigation I did pick up specific changes
> from after -r339076 that seemed to be tied to Ryzen and such.
> (They made no difference to the boot problems at the time
> but I saw no reason to remove them.)
>
> # uname -apKU
> FreeBSD FBSDFSSD 12.0-ALPHA8 FreeBSD 12.0-ALPHA8 #5 r339076:339432M: Sun
> Oct 21 16:44:25 PDT 2018     markmi_at_FBSDFSSD:/usr/obj/amd64_clang/amd64.amd64/usr/src/amd64.amd64/sys/GENERIC-NODBG
> amd64 amd64 1200084 1200084
>

The phrase "no active use" bothers me. What does that mean? Are there any

Warner
Received on Mon Oct 22 2018 - 01:29:03 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:18 UTC