TARGET_ARCH=powerpc head -r317820 production-style kernel: periodic panics always in pid=11 (the Idle threads)

From: Mark Millard <markmi_at_dsl-only.net>
Date: Tue, 9 May 2017 14:00:53 -0700
kgdb is not working for powerpc, neither system
nor ports. I've used "strings" to extract the 
later information below about the failures.

The time frames to failure are widely variable,
minutes to hours.

I've never seen the below with a debug kernel, only with
production-style. I have not seen any such problems for
powerpc64, aarch64 (with -mcpu=cortex-a53 ), armv6
(with -mcpu=cortex-a7 ), or amd64. Just powerpc.
The powerpc and powerpc64 hardware is (e.g.) the same
old PowerMac G5 so-called "Quad Core" used with
two different boot SSDs.

Note: This reproduces for me for pure gcc 4.2.1
      based builds. My usual clang-targetting-
      powerpc experiments are not involved here.

I'd not updated for a long time before this due to
the status of the clang compiler not changing and
its powerpc stack code-generation problems being
difficult to work around.

My kernels are unusual by having both sc and
vt in the build and ps3 disabled. I happen to be
using sc because it works with the 2560x1440
display that is currently connected but with vt
it fails to boot for such a size.

Of 7 example vmcore.* files. . .
(Note that all are pid 11 Idle-process thread
failures)

3 contain:

fatal kernel trap:
   exception       = 0x903a64e (unknown)
   srr0            = 0x7ff760
   srr1            = 0xc1007c
   lr              = 0x907f
   curthread       = 0x147d6c0
          pid = 11, comm = idle: cpu0
[ thread pid 11 tid 100003 ]
Stopped at      ffs_truncate+0x1080:    stw     r11, 0xf8(r31)

1 contains (cpu1 instead of cpu0, so different tid):

fatal kernel trap:
   exception       = 0x903a64e (unknown)
   srr0            = 0x7ff760
   srr1            = 0xc1007c
   lr              = 0x907f
   curthread       = 0x147d360
          pid = 11, comm = idle: cpu1
[ thread pid 11 tid 100004 ]
Stopped at      ffs_truncate+0x1080:    stw     r11, 0xf8(r31)

1 contains:

fatal kernel trap:
   exception       = 0x21000000 (unknown)
   srr0            = 0x7c0903
   srr1            = 0xa64e8004
   lr              = 0x807fc9e7
   curthread       = 0x147d000
          pid = 11, comm = idle: cpu2
[ thread pid 11 tid 100005 ]
Stopped at      audit_commit+0x24f:     illegal instruction 4915f00

1 contains:

fatal kernel trap:
   exception       = 0x300 (data storage interrupt)
   virtual address = 0x7ff76000
   dsisr           = 0x40000000
   srr0            = 0x8e3cf8
   srr1            = 0x1032
   lr              = 0x8e3ce8
   curthread       = 0x147d6c0
          pid = 11, comm = idle: cpu0
panic: data storage interrupt trap
cpuid = 0
time = 1494057319
KDB: stack backtrace:
     0xdf5e52c0: at kdb_backtrace+0x5c
0xdf5e5330: at vpanic+0x1ec
0xdf5e53a0: at panic+0x54
0xdf5e53f0: at trap_fatal+0x1cc
0xdf5e5420: at trap+0x122c
0xdf5e55c0: at powerpc_interrupt+0x180
0xdf5e55f0: kernel DSI read trap _at_ 0x7ff76000 by db_disasm+0x30: srr1=0x1032
            r1=0xdf5e56b0 cr=0x24009022 xer=0 ctr=0x1852cc sr=0x40000000
0xdf5e56b0: at 0x1007460
0xdf5e56d0: at db_print_loc_and_inst+0x60
0xdf5e5700: at db_trap+0x104
0xdf5e5790: at kdb_trap+0x1bc
0xdf5e5810: at trap_fatal+0x1b0
0xdf5e5840: at trap+0x1184
0xdf5e5870: kernel DECR trap by cpu_idle_60x+0x88: srr1=0x9032
            r1=0xdf5e5930 cr=0x40000042 xer=0x20000000 ctr=0x8e3bd8
saved LR(0xfffffffe) is invalid

And 1 contains:

fatal kernel trap:
   exception       = 0x0 (unknown)
   srr0            = 0x903a64e
   srr1            = 0x80042100
   lr              = 0xc9e7c800
   curthread       = 0x147d360
          pid = 11, comm = idle: cpu1
[ thread pid 11 tid 100004 ]
Stopped at      0x903a64e:
fatal kernel trap:
   exception       = 0x300 (data storage interrupt)
   virtual address = 0x903a64e
   dsisr           = 0x40000000
   srr0            = 0x8e3cf8
   srr1            = 0x1032
   lr              = 0x8e3ce8
   curthread       = 0x147d360
          pid = 11, comm = idle: cpu1
panic: data storage interrupt trap
cpuid = 1
time = 1494132014
KDB: stack backtrace:
      0xdf5ea2c0: at kdb_backtrace+0x5c
0xdf5ea330: at vpanic+0x1ec
0xdf5ea3a0: at panic+0x54
0xdf5ea3f0: at trap_fatal+0x1cc
0xdf5ea420: at trap+0x122c
0xdf5ea5c0: at powerpc_interrupt+0x180
0xdf5ea5f0: kernel DSI read trap _at_ 0x903a64e by db_disasm+0x30: srr1=0x1032
            r1=0xdf5ea6b0 cr=0x24009022 xer=0 ctr=0x1852cc sr=0x40000000
0xdf5ea6b0: at 0x1007460
0xdf5ea6d0: at db_print_loc_and_inst+0x60
0xdf5ea700: at db_trap+0x104
0xdf5ea790: at kdb_trap+0x1bc
0xdf5ea810: at trap_fatal+0x1b0
0xdf5ea840: at trap+0x122c
0xdf5ea870: kernel EXI trap by cpu_idle_60x+0x88: srr1=0x9032
            r1=0xdf5ea930 cr=0x40000042 xer=0x20000000 ctr=0x8e3bd8
saved LR(0x5) is invalid


Most (but not all) of the above were while the
old PowerMac was sitting unused.

The pid 11 Idle thread commonality suggests to me
some sort of interrupt oddity messing up when the
idle threads were put to use for the interrupt.

The /usr/src/sys/powerpc/conf/* files in use
are (-NODBG for production style and -DBG for
debug style):


# more /usr/src/sys/powerpc/conf/GENERIC64vtsc-NODBG
#
# GENERIC -- Custom configuration for the powerpc/powerpc64
#

include "GENERIC64"

ident   GENERIC64vtsc-NODBG

makeoptions     DEBUG=-g                # Build kernel with gdb(1) debug symbols

nooptions       PS3                     # Sony Playstation 3               HACK!!! to allow sc

options         KDB                     # Enable kernel debugger support

options         ALT_BREAK_TO_DEBUGGER
options         BREAK_TO_DEBUGGER

# For minimum debugger support (stable branch) use:
options         KDB_TRACE               # Print a stack trace for a panic
options         DDB                     # Enable the kernel debugger
options         GDB                     # HACK!!! ...

# Extra stuff:
#options        VERBOSE_SYSINIT         # Enable verbose sysinit messages
#options        BOOTVERBOSE=1
#options        BOOTHOWTO=RB_VERBOSE
#options        KTR
#options        KTR_MASK=KTR_TRAP
##options       KTR_CPUMASK=0xF
#options        KTR_VERBOSE

# HACK!!! to allow sc for 2560x1440 display on Radeon X1950 that vt historically mishandled during booting
device          sc
#device                 kbdmux          # HACK: already listed by vt
options         SC_OFWFB        # OFW frame buffer
options         SC_DFLT_FONT    # compile font in
makeoptions     SC_DFLT_FONT=cp437


# Disable any extra checking for. . .
nooptions       DEADLKRES               # Enable the deadlock resolver
nooptions       INVARIANTS              # Enable calls of extra sanity checking
nooptions       INVARIANT_SUPPORT       # Extra sanity checks of internal structures, required by INVARIANTS
nooptions       WITNESS                 # Enable checks to detect deadlocks and cycles
nooptions       WITNESS_SKIPSPIN        # Don't run witness on spinlocks for speed
nooptions       DIAGNOSTIC
nooptions       MALLOC_DEBUG_MAXZONES   # Separate malloc(9) zones


I show my production (NODBG) and debug (DBG)

# more /usr/src/sys/powerpc/conf/GENERICvtsc-NODBG
#
# GENERIC -- Custom configuration for the powerpc/powerpc
#

include "GENERIC"

ident   GENERICvtsc-NODBG

makeoptions     DEBUG=-g                # Build kernel with gdb(1) debug symbols

nooptions       PS3                     # Sony Playstation 3               HACK!!! to allow sc

options         KDB                     # Enable kernel debugger support

options         ALT_BREAK_TO_DEBUGGER
options         BREAK_TO_DEBUGGER

# For minimum debugger support (stable branch) use:
options         KDB_TRACE               # Print a stack trace for a panic
options         DDB                     # Enable the kernel debugger
options         GDB                     # HACK!!! ...

# Extra stuff:
#options        VERBOSE_SYSINIT         # Enable verbose sysinit messages
#options        BOOTVERBOSE=1
#options        BOOTHOWTO=RB_VERBOSE
#options        KTR
#options        KTR_MASK=KTR_TRAP
##options       KTR_CPUMASK=0xF
#options        KTR_VERBOSE

# HACK!!! to allow sc for 2560x1440 display on Radeon X1950 that vt historically mishandled during booting
device          sc
#device                 kbdmux          # HACK: already listed by vt
options         SC_OFWFB        # OFW frame buffer
options         SC_DFLT_FONT    # compile font in
makeoptions     SC_DFLT_FONT=cp437


# Disable any extra checking for. . .
nooptions       DEADLKRES               # Enable the deadlock resolver
nooptions       INVARIANTS              # Enable calls of extra sanity checking
nooptions       INVARIANT_SUPPORT       # Extra sanity checks of internal structures, required by INVARIANTS
nooptions       WITNESS                 # Enable checks to detect deadlocks and cycles
nooptions       WITNESS_SKIPSPIN        # Don't run witness on spinlocks for speed
nooptions       DIAGNOSTIC
nooptions       MALLOC_DEBUG_MAXZONES   # Separate malloc(9) zones


# more /usr/src/sys/powerpc/conf/GENERICvtsc-DBG
#
# GENERIC -- Custom configuration for the powerpc/powerpc
#

include "GENERIC"

ident   GENERICvtsc-DBG

makeoptions     DEBUG=-g                # Build kernel with gdb(1) debug symbols

nooptions       PS3                     # Sony Playstation 3               HACK!!! to allow sc

options         KDB                     # Enable kernel debugger support

options         ALT_BREAK_TO_DEBUGGER
options         BREAK_TO_DEBUGGER

# For minimum debugger support (stable branch) use:
options         KDB_TRACE               # Print a stack trace for a panic
options         DDB                     # Enable the kernel debugger
options         GDB                     # HACK!!! ...

# Extra stuff:
options         VERBOSE_SYSINIT         # Enable verbose sysinit messages
options         BOOTVERBOSE=1
options         BOOTHOWTO=RB_VERBOSE
#options        KTR
#options        KTR_MASK=KTR_TRAP|KTR_PROC
##options       KTR_CPUMASK=0xF
#options        KTR_VERBOSE

# HACK!!! to allow sc for 2560x1440 display on Radeon X1950 that vt historically mishandled during booting
device          sc
#device                 kbdmux          # HACK: already listed by vt
options         SC_OFWFB        # OFW frame buffer
options         SC_DFLT_FONT    # compile font in
makeoptions     SC_DFLT_FONT=cp437


# Enable any extra checking for. . .
options         DEADLKRES               # Enable the deadlock resolver
options         INVARIANTS              # Enable calls of extra sanity checking
options         INVARIANT_SUPPORT       # Extra sanity checks of internal structures, required by INVARIANTS
options         WITNESS                 # Enable checks to detect deadlocks and cycles
options         WITNESS_SKIPSPIN        # Don't run witness on spinlocks for speed
options         DIAGNOSTIC
options         MALLOC_DEBUG_MAXZONES=8 # Separate malloc(9) zones



For both -NODBG and -DBG the:

options         ALT_BREAK_TO_DEBUGGER
options         BREAK_TO_DEBUGGER

are recent additions because of the problem.
I explicitly gave myself the option to break
to the debugger if I decide to.

===
Mark Millard
markmi at dsl-only.net
Received on Tue May 09 2017 - 19:07:37 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:11 UTC