Re: 12-Current panics on boot (didn't a week ago.)

From: Andrew Reilly <areilly_at_bigpond.net.au>
Date: Sat, 31 Mar 2018 11:27:46 +1100
Hi Jonathan, all,

I've just compiled and booted a kernel derived from current-GENERIC
but with nooptions TCP_BLACKBOX, and much to my surprise it boots.
Possible link to network-related activities is that the next line
of boot output that was not being displayed during the crash is:

[ath_hal] loaded

That's vaguely network-shaped: could it be an issue?

Please let me know if there's anything else that I could test or
poke, in order to find the real culprit.

My make.conf says:

KERNCONF=ZEN
WRKDIRPREFIX=/usr/obj/ports
MALLOC_PRODUCTION=yes

My /usr/src/sys/amd64/conf/ZEN says:

include GENERIC
nooptions TCP_BLACKBOX

Uname -a says:
FreeBSD Zen.ac-r.nu 12.0-CURRENT FreeBSD 12.0-CURRENT #0 r331768M: Sat Mar 31 10:47:52 AEDT 2018     root_at_Zen:/usr/obj/usr/src/amd64.amd64/sys/ZEN  amd64

Cheers,

Andrew


Here's the top part of the new dmesg.boot, FYI:
Copyright (c) 1992-2018 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 12.0-CURRENT #0 r331768M: Sat Mar 31 10:47:52 AEDT 2018
    root_at_Zen:/usr/obj/usr/src/amd64.amd64/sys/ZEN amd64
FreeBSD clang version 6.0.0 (tags/RELEASE_600/final 326565) (based on LLVM 6.0.0)
WARNING: WITNESS option enabled, expect reduced performance.
VT(vga): resolution 640x480
CPU: AMD Ryzen 7 1700 Eight-Core Processor           (2994.45-MHz K8-class CPU)
  Origin="AuthenticAMD"  Id=0x800f11  Family=0x17  Model=0x1  Stepping=1
  Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
  Features2=0x7ed8320b<SSE3,PCLMULQDQ,MON,SSSE3,FMA,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND>
  AMD Features=0x2e500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM>
  AMD Features2=0x35c233ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,SKINIT,WDT,TCE,Topology,PCXC,PNXC,DBE,PL2I,MWAITX>
  Structured Extended Features=0x209c01a9<FSGSBASE,BMI1,AVX2,SMEP,BMI2,RDSEED,ADX,SMAP,CLFLUSHOPT,SHA>
  XSAVE Features=0xf<XSAVEOPT,XSAVEC,XINUSE,XSAVES>
  AMD Extended Feature Extensions ID EBX=0x7<CLZERO,IRPerf,XSaveErPtr>
  SVM: (disabled in BIOS) NP,NRIP,VClean,AFlush,DAssist,NAsids=32768
  TSC: P-state invariant, performance statistics
real memory  = 34359738368 (32768 MB)
avail memory = 33271214080 (31729 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <ALASKA A M I >
FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
FreeBSD/SMP: 1 package(s) x 2 cache groups x 4 core(s)
random: unblocking device.
Firmware Warning (ACPI): Optional FADT field Pm2ControlBlock has valid Length but zero Address: 0x0000000000000000/0x1 (20180313/tbfadt-796)
ioapic0 <Version 2.1> irqs 0-23 on motherboard
ioapic1 <Version 2.1> irqs 24-55 on motherboard
SMP: AP CPU #7 Launched!
SMP: AP CPU #3 Launched!
SMP: AP CPU #2 Launched!
SMP: AP CPU #6 Launched!
SMP: AP CPU #5 Launched!
SMP: AP CPU #4 Launched!
SMP: AP CPU #1 Launched!
Timecounter "TSC-low" frequency 1497224985 Hz quality 1000
random: entropy device external interface
[ath_hal] loaded
module_register_init: MOD_LOAD (vesa, 0xffffffff8109f600, 0) error 19
random: registering fast source Intel Secure Key RNG
random: fast provider: "Intel Secure Key RNG"
kbd1 at kbdmux0
netmap: loaded module
nexus0
vtvga0: <VT VGA driver> on motherboard
cryptosoft0: <software crypto> on motherboard
aesni0: <AES-CBC,AES-XTS,AES-GCM,AES-ICM,SHA1,SHA256> on motherboard
acpi0: <ALASKA A M I > on motherboard
acpi0: Power Button (fixed)
cpu0: <ACPI CPU> on acpi0
cpu1: <ACPI CPU> on acpi0
cpu2: <ACPI CPU> on acpi0
cpu3: <ACPI CPU> on acpi0
cpu4: <ACPI CPU> on acpi0
cpu5: <ACPI CPU> on acpi0
cpu6: <ACPI CPU> on acpi0
cpu7: <ACPI CPU> on acpi0
attimer0: <AT timer> port 0x40-0x43 irq 0 on acpi0
Timecounter "i8254" frequency 1193182 Hz quality 0
Event timer "i8254" frequency 1193182 Hz quality 100
atrtc0: <AT realtime clock> port 0x70-0x71 on acpi0
atrtc0: registered as a time-of-day clock, resolution 1.000000s
Event timer "RTC" frequency 32768 Hz quality 0
hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff irq 0,8 on acpi0
Timecounter "HPET" frequency 14318180 Hz quality 950
Event timer "HPET" frequency 14318180 Hz quality 350
Event timer "HPET1" frequency 14318180 Hz quality 350
Event timer "HPET2" frequency 14318180 Hz quality 350
Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
acpi_timer0: <32-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
amdsmn0: <AMD Family 17h System Management Network> on hostb0
amdtemp0: <AMD CPU On-Die Thermal Sensors> on hostb0


On Sun, Mar 25, 2018 at 04:35:31AM +0000, Jonathan Looney wrote:
> For now, you can update through r331485 and then take TCP_BLACKBOX out of
> your kernel config file. That won’t really “fix” anything, but should at
> least get you a booting system (assuming the new code from r331347 is
> really triggering a problem).
> 
> 
> I’ll take another look to see if I missed something in the commit. But, at
> the moment, I’m hard-pressed to see how r331347 would cause the problem you
> describe.
> 
> 
> Jonathan
> 
> On Sat, Mar 24, 2018 at 9:17 PM Andrew Reilly <areilly_at_bigpond.net.au>
> wrote:
> 
> > OK, I've completed the search: r331346 works, r331347 panics
> > somewhere in the initialization of random.
> >
> > In the 331347 change (Add the "TCP Blackbox Recorder") I can't see
> > anything obvious to tweak, unfortunately.  It's a fair chunk of new
> > code but it's all network-stack related, and my kernel is panicking
> > long before any network activity happens.
> >
> > Any suggestions?
> >
> > Cheers,
> >
> > Andrew
> >
> > On Sat, Mar 24, 2018 at 05:23:18PM -0600, Warner Losh wrote:
> > > Thanks Andrew... I can't recreate this on my VM nor my real hardware.
> > >
> > > Warner
> > >
> > > On Sat, Mar 24, 2018 at 5:22 PM, Andrew Reilly <areilly_at_bigpond.net.au>
> > > wrote:
> > >
> > > > So, r331464 crashes in the same place, on my system.  r331064 still
> > boots
> > > > OK.  I'll keep searching.
> > > >
> > > > One week ago there was a change to randomdev to poll for signals every
> > so
> > > > often, as a defence against very large reads.  That wouldn't have
> > > > introduced a race somewhere,
> > > > or left things in an unexpected state, perhaps?  That change (r331070)
> > by
> > > > cem_at_ is just a few revisions after the one that is working for me.
> > I'll
> > > > start looking there...
> > > >
> > > > Cheers,
> > > >
> > > > Andrew
> > > >
> > > > On Sun, Mar 25, 2018 at 07:49:17AM +1100, Andrew Reilly wrote:
> > > > > Hi Warner,
> > > > >
> > > > > The breakage was in 331470,  and at least one version earlier, that I
> > > > updated past when it panicked.
> > > > >
> > > > > I'm guessing that kdb's inability to dump would be down to it not
> > having
> > > > found any disk devices yet, right?  So yes, bisecting to narrow down
> > the
> > > > issue is probably the best bet.  I'll try your r331464: if that works
> > that
> > > > leaves only four or five revisions.  Of course the breakage could be
> > > > hardware specific.
> > > > >
> > > > > Cheers,
> > > > > --
> > > > > Andrew
> > > >
> >
> _______________________________________________
> freebsd-current_at_freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"
Received on Fri Mar 30 2018 - 22:28:11 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:15 UTC