various VM panics (arm64) / hangs after init (arm64/x86)

From: Bjoern A. Zeeb <bzeeb-lists_at_lists.zabbadoz.net>
Date: Fri, 29 Nov 2019 01:07:32 +0000
Hi,

over the last days both on x86 as well as on arm64 I had various strange 
“hangs” or panics.  I’ll try to find svn revisions which do 
compile and try and see if I can do a bit of bisect (the first two 
attempts did not compile; should try the CI system).  Given it is all 
new setups from the last week, I do not have previous behaviour (or 12.x 
one only).


(a) on x86 and arm64 I have “hangs” after starting init around the 
re-mount;  one of the later “hangs” on arm64 was:

Initially I thought it was related to USB as not compiling that in made 
the problem go away on x86 (where the console was frozen in a 1 VCPU VM 
so I couldn’t debug).
On Arm64 I can now press ^T and see that it actually is waiting hard .. 
(and removing USB from the kernel did move the problem slightly but not 
make it go away).

(notice real time)
load: 0.45  cmd: sysctl 27 [objtrm] 1711.61r 0.00u 0.00s 0% 0k



(b) The quickest I found so far was on arm64/head from an hour ago; this 
is with and without an md_image or other modules loaded and GENERIC 
rather than GENERIC-NODEBUG (reproduced in two consecutive boots);  it 
seems to sometimes matter whether I boot with hw.ncpu=4 or hw.ncpu=1:

---<<BOOT>>---
KDB: debugger backends: ddb
KDB: current backend: ddb
                    Type     Physical      Virtual   #Pages Attr
      ConventionalMemory 000000200000       200000 00007eea WB
     RuntimeServicesData 0000080ea000      80ea000 0000002c WB RUNTIME
      ConventionalMemory 000008116000      8116000 000e46e9 WB
              LoaderData 0000ec7ff000     ec7ff000 00008696 WB
              LoaderCode 0000f4e95000     f4e95000 0000008d WB
     RuntimeServicesData 0000f4f22000     f4f22000 00000001 WB RUNTIME
                Reserved 0000f4f23000     f4f15000 00000006 WB
     RuntimeServicesData 0000f4f29000     f4f29000 00000001 WB RUNTIME
                Reserved 0000f4f2a000     f4f2a000 00000002 WB
              LoaderData 0000f4f2c000     f4f2c000 00003014 WB
     RuntimeServicesCode 0000f7f40000     f7f40000 00000010 WB RUNTIME
              LoaderData 0000f7f50000     f4f2c000 000000b0 WB
Physical memory chunk(s):
   0x00200000 - 0xf4f22fff,  3917 MB (1002787 pages)
   0xf4f29000 - 0xf4f29fff,     0 MB (      1 pages)
   0xf4f2c000 - 0xf7f3ffff,    48 MB (  12308 pages)
   0xf7f50000 - 0xf7ffffff,     0 MB (    176 pages)
Excluded memory regions:
   0x080ea000 - 0x08115fff,     0 MB (     44 pages) NoAlloc
   0xec800000 - 0xf0f38fff,    71 MB (  18233 pages) NoAlloc
   0xf4f22000 - 0xf4f2bfff,     0 MB (     10 pages) NoAlloc
   0xf7f40000 - 0xf7f4ffff,     0 MB (     16 pages) NoAlloc
Found 6 CPUs in the device tree
Copyright (c) 1992-2019 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
         The Regents of the University of California. All rights 
reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 13.0-arm64-base13-nanopct4 #1 r355193M 
11d01e051405-c509697(master): Fri Nov 29 00:31:57 UTC 2019
..
FreeBSD clang version 9.0.0 (tags/RELEASE_900/final 372316) (based on 
LLVM 9.0.0)
WARNING: WITNESS option enabled, expect reduced performance.
VT: init without driver.
Preloaded elf kernel "/boot/kernel/kernel" at 0xffff00000450c000.
Preloaded md_image "/base13-nanopct4-r355193" at 0xffff000004514e80.
Preloaded elf module "/boot/kernel/nullfs.ko" at 0xffff000004514ee0.
Preloaded elf module "/boot/kernel/geom_eli.ko" at 0xffff000004515638.
module firmware already present!
panic: page 0xfffffd00ec54c170 has object
cpuid = 0
time = 1
KDB: stack backtrace:
db_trace_self() at db_trace_self_wrapper+0x28
          pc = 0xffff0000006dee8c  lr = 0xffff0000001066c8
          sp = 0xffff0000000103f0  fp = 0xffff000000010600

db_trace_self_wrapper() at vpanic+0x18c
          pc = 0xffff0000001066c8  lr = 0xffff0000003b563c
          sp = 0xffff000000010610  fp = 0xffff0000000106c0

vpanic() at panic+0x44
          pc = 0xffff0000003b563c  lr = 0xffff0000003b53ec
          sp = 0xffff0000000106d0  fp = 0xffff000000010750

panic() at vm_page_alloc_check+0x78
          pc = 0xffff0000003b53ec  lr = 0xffff000000692b2c
          sp = 0xffff000000010760  fp = 0xffff000000010760

vm_page_alloc_check() at vm_page_alloc_domain_after+0x2a4
          pc = 0xffff000000692b2c  lr = 0xffff000000692298
          sp = 0xffff000000010770  fp = 0xffff0000000107f0

vm_page_alloc_domain_after() at kmem_back_domain+0xec
          pc = 0xffff000000692298  lr = 0xffff00000067de6c
          sp = 0xffff000000010800  fp = 0xffff000000010850

kmem_back_domain() at kmem_malloc_domainset+0xc0
          pc = 0xffff00000067de6c  lr = 0xffff00000067dd48
          sp = 0xffff000000010860  fp = 0xffff0000000108d0

kmem_malloc_domainset() at vm_ksubmap_init+0x54
          pc = 0xffff00000067dd48  lr = 0xffff00000067d1d8
          sp = 0xffff0000000108e0  fp = 0xffff000000010920

vm_ksubmap_init() at cpu_startup+0x1c
          pc = 0xffff00000067d1d8  lr = 0xffff0000006ea77c
          sp = 0xffff000000010930  fp = 0xffff000000010930

cpu_startup() at mi_startup+0x12c
          pc = 0xffff0000006ea77c  lr = 0xffff00000034e2d4
          sp = 0xffff000000010940  fp = 0xffff0000000109a0

mi_startup() at virtdone+0x58
          pc = 0xffff00000034e2d4  lr = 0xffff0000000010c8
          sp = 0xffff0000000109b0  fp = 0x0000000000000000

KDB: enter: panic
[ thread pid 0 tid 0 ]
Stopped at      0




FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 13.0-arm64-base13-nanopct4 #1 r355193M 
11d01e051405-c509697(master): Fri Nov 29 00:31:57 UTC 2019
..
FreeBSD clang version 9.0.0 (tags/RELEASE_900/final 372316) (based on 
LLVM 9.0.0)
WARNING: WITNESS option enabled, expect reduced performance.
VT: init without driver.
Preloaded elf kernel "/boot/kernel/kernel" at 0xffff0000014b2000.
module firmware already present!
panic: page 0xfffffd00ec551f68 has object
cpuid = 0
time = 1
KDB: stack backtrace:
db_trace_self() at db_trace_self_wrapper+0x28
          pc = 0xffff0000006dee8c  lr = 0xffff0000001066c8
          sp = 0xffff0000000103f0  fp = 0xffff000000010600

db_trace_self_wrapper() at vpanic+0x18c
          pc = 0xffff0000001066c8  lr = 0xffff0000003b563c
          sp = 0xffff000000010610  fp = 0xffff0000000106c0

vpanic() at panic+0x44
          pc = 0xffff0000003b563c  lr = 0xffff0000003b53ec
          sp = 0xffff0000000106d0  fp = 0xffff000000010750

panic() at vm_page_alloc_check+0x78
          pc = 0xffff0000003b53ec  lr = 0xffff000000692b2c
          sp = 0xffff000000010760  fp = 0xffff000000010760

vm_page_alloc_check() at vm_page_alloc_domain_after+0x2a4
          pc = 0xffff000000692b2c  lr = 0xffff000000692298
          sp = 0xffff000000010770  fp = 0xffff0000000107f0

vm_page_alloc_domain_after() at kmem_back_domain+0xec
          pc = 0xffff000000692298  lr = 0xffff00000067de6c
          sp = 0xffff000000010800  fp = 0xffff000000010850

kmem_back_domain() at kmem_malloc_domainset+0xc0
          pc = 0xffff00000067de6c  lr = 0xffff00000067dd48
          sp = 0xffff000000010860  fp = 0xffff0000000108d0

kmem_malloc_domainset() at vm_ksubmap_init+0x54
          pc = 0xffff00000067dd48  lr = 0xffff00000067d1d8
          sp = 0xffff0000000108e0  fp = 0xffff000000010920

vm_ksubmap_init() at cpu_startup+0x1c
          pc = 0xffff00000067d1d8  lr = 0xffff0000006ea77c
          sp = 0xffff000000010930  fp = 0xffff000000010930

cpu_startup() at mi_startup+0x12c
          pc = 0xffff0000006ea77c  lr = 0xffff00000034e2d4
          sp = 0xffff000000010940  fp = 0xffff0000000109a0

mi_startup() at virtdone+0x58
          pc = 0xffff00000034e2d4  lr = 0xffff0000000010c8
          sp = 0xffff0000000109b0  fp = 0x0000000000000000

KDB: enter: panic
[ thread pid 0 tid 0 ]
Stopped at      0
db>


(c) then one before on a GENERIC-NODEBUG kernel was along these lines 
with hw.ncpu=4 and an md_image loaded:

..
Release APs...gic0: Start searching for Re-Distributor
gic0: Start searching for Re-Distributor
gic0: Start searching for Re-Distributor
gic0: CPU1 Re-Distributor has been found
gic0: CPU3 Re-Distributor has been found
gic0: CPU2 Re-Distributor has been found
gic0: CPU1 Re-Distributor woke up
gic0:P  Re-Distributor woke up
gic0: CPU1 enabled CPU interface via system registers
gic0: CPU2 enabled CPU interface via system registers
gic0: CPU3 Re-Distributor woke up
gic0: CPU3 enabled CPU interface via system registers
done
F_at_tal data abort:
CPU  0: ARM Cortex-A53 r0p4 affinity:  0
   x0: fffffd00015f2fb8
  Instruction Set Attributes 0 = <CRC32,SHA2,SHA1,AES+PMULL>
   x1: ffff00005abed508
  Instruction Set Attributes 1 = <>
   x2:                0
   x3: ffff00005abed50c
          Processor Features 0 = <GIC,AdvSIMD,FP,2"2,EL2 32,EL1 32,EL0 
32>
   x4:                2
          Processor Features 1 = <>
   x5: ffff000000c7a2a0
   x6: ffff000000c7a2a8
       Memory Model Features 0 = <TGran4,TGran64,SNSMem,BigEnd,16bit 
ASID,1TB PA>
   x7: ffff000000c7a2b0
       Memory Model Features 1 = <8bit VMID>
   Memory Model Features 2 = <32bit CCIDX,48bit VA>
   x9:               3d
              Debug Features 0 = <2 CTX BKPTs,4 Watchpoints,6 
Breakpoints,PMUv3,Debugv8>
  x10: fffffd00015f2e00
              Debug Features 1 = <>
  x11:                2
          Auxiliary Features 0 = <>
  x12: fffffd0000802b40
       Auxiliary Features 1 = <>
  x13:              100
CPU  1: ARM Cortex-A53 r0p4 affinity:  1
  x14:                1
CPU  2: ARM Cortex-A53 r0p4 affinity:  2
  x15:                0
CPU  3: ARM Cortex-A53 r0p4 affinity:  3
  x16:                1
  x17: ffff000000c7a2a0
regulator: shutting dowN vbus_py`Ec
                                    x18: ffff00005abed4e0
  x19: fffffd0000802a80
  x20:         ffffffff
  x21:               18
  x22:               3d
  x23: fffffd0000819000
  x24: fffffd000214d218
  x25:                a
  x26:               3d
  x27: ffff000000a6f000
  x28:                2
  x29: ffff00005abed580
   sp: ffff00005abeD4e
                       lr: ffff0000006a42a0
  elr: ffff0000006a4384
spsr:         80000145
  far:                a
  esr:         96000044
panic: vm_fault failed: ffff0000006a4384
cpuid = 3
time = 1
KDB: stack backtrace:
db_trace_self() at db_trace_self_wrapper+0x28
          pc = 0xffff00000070cb6c  lr = 0xffff00000010695c
db_trace_self_wrapper() at vpanic+0x18c0xffff00005abed0e0
          pc = 0xffff00000010695c  lr = 0xffff0000003ca864
          sp = 0xffff00005abed0f0  fp = 0xffff00005abed1a0

vpanic() at panic+0x44
          pc = 0xffff0000003ca864  lr = 0xffff0000003ca6d4
          sp = 0xffff00005abed1b0  fp = 0xffff00005abed230
panic() at d_at_`_at__abort+0x1e0
          pc = 0xffff0000003ca6d4  lr = 0xffff000000728c34
          sp = 0xffff00005abed240  fp = 0xffff00005abed2f0

data_abort() at do_eH1h_sync+0x144
          pc = 0xffff000000728c34  lr = 0xffff000000727e50
          sp = 0xffff00005abed300  fp = 0xffff00005abed330

do_el1h_sync() at _at_andHe_el1h_sync+0x78
          pc = 0xffff000000727e50  lr = 0xffff00000070f078
          sp = 0xffff00005abed340  fp = 0xffff00005abed450

handle_el1h_sync() ad zone_import+0x264
          pc = 0xffff00000070f078  lr = 0xffff0000006a429c
          sp = 0xffff00005abed460  fp = 0xffff00005abed580

zone_import() at cache_alloc+0x480
          pc = 0xffff0000006a429c  lr = 0xffff00000069f748
          sp = 0xffff00005abed590  fp = 0xffff00005abed5e0

cache_alloc() at uma_zalloc_arg+0x60
          pc = 0xffff00000069f748  lr = 0xffff00000069efb8
          sp = 0xffff00005abed5f0  fp = 0xffff00005aBed6

uma_zalloc_arg() at malloc+0x70
          pc = 0xffff00000069efb8  lr = 0xffff0000003a4320
          sp = 0xffff00005abed630  fp = 0xffff00005abed660

malloc() at crdup+0x20
          pc = 0xffff0000003a4320  lr = 0xffff0000003ba0e0
          sp = 0xffff00005abed670 fp8 xffff00005abed680

crdup() at vfs_mount_alloc+0xe4
          pc = 0xffff0000003ba0e0  hr 8  xffff00000049002c
          sp = 0xffff00005abed690  fp = 0xffff00005abed6c0

vfs_mount_alloc() at vfs_mountroot+0x220
          pc = 0xffff00000049002c  lr = 0xffff000000494204
          sp = 0xffff00005abed6d0  fp = 0xffff00005abed880

vfs_mountroot() at start_init+0x"4       pc =0xDDdD000000494204  lr = 
0xffff0000003604b0
          sp = 0xffff00005abed890  fp = 0xffff00005abed940

start_init() at fork_exit+0x90
          pc = 0xffff0000003604b0  lr = 0xffff000000389598
          sp = 0xffff00005abed950  fp = 0xffff00005abed980

fork_exit() at fo`k_trampoline+0x10
          pc = 0xffff000000389598  lr = 0xffff000000727b8c
          s` < xDDff00005abed990  fp = 0x0000000000000000

KDB: enter: panic
[ thread pid 1 tid 100002 ]
Stopped at      zone_import+0x34c:      undefined       f900056c



I do have more of these kinds of things .. and I can probably produce 
more by rebuilding the kernel and playing with a few parameters..

Any hints or suggestions welcome.

/bz
Received on Fri Nov 29 2019 - 00:07:48 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:22 UTC