Re: UMA initialization failure with 48 core ARM64

From: Michał Stanek <mst_at_semihalf.com>
Date: Sun, 17 May 2015 13:10:02 +0200
On 2015-05-16 00:42, Stanislav Sedov wrote:
>
>> On May 15, 2015, at 11:30 AM, Michał Stanek<mst_at_semihalf.com>  wrote:
>>
>> Hi,
>>
>> I am experiencing an early failure of UMA on an ARM64 platform with 48
>> cores enabled. I get a kernel panic during initialization of VM. Here is
>> the boot log (lines with 'MST:' are my own debug printfs).
>>
>> Copyright (c) 1992-2015 The FreeBSD Project.
>> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
>>     The Regents of the University of California. All rights reserved.
>> FreeBSD is a registered trademark of The FreeBSD Foundation.
>> FreeBSD 11.0-CURRENT #333 52fd91e(smp_48)-dirty: Fri May 15 18:26:56 CEST
>> 2015
>>     mst_at_arm64-prime:/usr/home/mst/freebsd_v8/obj_kernel/arm64.aarch64/usr/home/mst/freebsd_v8/kernel/sys/THUNDER-88XX
>> arm64
>> FreeBSD clang version 3.6.0 (tags/RELEASE_360/final 230434) 20150225
>> MST: in vm_mem_init()
>> MST: in vmem_init() with param *vm == kernel_arena
>> MST: in vmem_xalloc() with param *vm == kernel_arena
>> MST: in vmem_xalloc() with param *vm == kmem_arena
>> panic: mtx_lock() of spin mutex (null) _at_
>> /usr/home/mst/freebsd_v8/kernel/sys/kern/subr_vmem.c:1165
>> cpuid = 0
>> KDB: enter: panic
>> [ thread pid 0 tid 0 ]
>> Stopped at      0xffffff80001f4f80:
>>
>> The kernel boots fine when MAXCPU is set to 30 or lower, but the error
>> above always appears when it is set to a higher value.
>>
>> The panic is triggered by a KASSERT in __mtx_lock_flags() which is called
>> with the macro VMEM_LOCK(vm) in vmem_xalloc(). This is line 1143 in
>> subr_vmem.c (log shows different line number due to added printfs).
>> It looks like the lock belongs to 'kmem_arena' which is uninitialized at
>> this point (kmeminit() has not been called yet).
>>
>> While debugging, I tried modifying VM code as a quick workaround. I
>> replaced the number of cores to 1 wherever mp_ncpus, mp_maxid or MAXCPU
>> (and others) are read. This, I believe, limits UMA per-cpu caches to just
>> one, while the rest of the OS (scheduler, etc) sees all 48 cores.
>> In addition, I changed UMA_BOOT_PAGES in sys/vm/uma_int.h to 512 (default
>> was 64).
>> With these tweaks, I got a successful (but not really stable) boot with 48
>> cores. Of course these are dirty hacks and a proper solution is needed.
>>
>> I am a bit surprised that the kernel fails with MAXCPU==48 as the amd64
>> arch has this value set to '256' and I have read posts that other platforms
>> with even more cores have worked fine. Perhaps I need to tweak some other
>> VM parameters, apart from UMA_BOOT_PAGES (AKA vm.boot_pages), but I am not
>> sure how.
>>
>> I included a full stacktrace and a more verbose log (with UMA_DEBUG macros
>> enabled) in the attachment. There is also a diff of the hacks I used while
>> debugging.
>>
>>
>
> Hi, Michal!
>
> It looks like the log attachment didn’t make it though the mailing list.
> Can you please resend it again?
>
> The panic suggests that a mutex was left uninitialized...
>
> --
> ST4096-RIPE
>
>
>
Yes you're right, kmem_arena's mutex is used before it is initialized. I 
do not know why increasing MAXCPU causes such behavior.

Here is the stacktrace at the point of the panic:

db_stack_trace
db_command
db_command_loop
db_trap
kdb_trap
handle_el1h_sync
vpanic
kassert_panic
__mtx_lock_flags
vmem_xalloc
vmem_bt_alloc
keg_alloc_slab
keg_fetch_slab
zone_fetch_slab
zone_import
zone_alloc_item
bt_fill
vmem_xalloc
vmem_alloc
kmem_init_zero_region
vm_mem_init
mi_startup
virtdone


Diff of the hacks in UMA:

diff --git a/sys/kern/kern_malloc.c b/sys/kern/kern_malloc.c
index aef1e4e..be225fb 100644
--- a/sys/kern/kern_malloc.c
+++ b/sys/kern/kern_malloc.c
_at__at_ -874,7 +874,7 _at__at_ malloc_uninit(void *data)
       * Look for memory leaks.
       */
      temp_allocs = temp_bytes = 0;
-    for (i = 0; i < MAXCPU; i++) {
+    for (i = 0; i < 1; i++) {
          mtsp = &mtip->mti_stats[i];
          temp_allocs += mtsp->mts_numallocs;
          temp_allocs -= mtsp->mts_numfrees;
diff --git a/sys/kern/subr_vmem.c b/sys/kern/subr_vmem.c
index 80940be..89d62ed 100644
--- a/sys/kern/subr_vmem.c
+++ b/sys/kern/subr_vmem.c
_at__at_ -665,7 +665,8 _at__at_ vmem_startup(void)
       * CPUs to attempt to allocate new tags concurrently to limit
       * false restarts in UMA.
       */
-    uma_zone_reserve(vmem_bt_zone, BT_MAXALLOC * (mp_ncpus + 1) / 2);
+    //mst look here
+    uma_zone_reserve(vmem_bt_zone, BT_MAXALLOC * (1 + 1) / 2);
      uma_zone_set_allocf(vmem_bt_zone, vmem_bt_alloc);
  #endif
  }
diff --git a/sys/vm/uma_core.c b/sys/vm/uma_core.c
index b96c421..6382437 100644
--- a/sys/vm/uma_core.c
+++ b/sys/vm/uma_core.c
_at__at_ -98,6 +98,14 _at__at_ __FBSDID("$FreeBSD$");
  #include <vm/memguard.h>
  #endif

+//mst: override some defines
+#undef curcpu
+#define    curcpu    0
+#undef    CPU_FOREACH
+#define    CPU_FOREACH(i)                            \
+    for ((i) = 0; (i) <= 0; (i)++)                \
+        if (!CPU_ABSENT((i)))
+
  /*
   * This is the zone and keg from which all zones are spawned.  The 
idea is that
   * even the zone & keg heads are allocated from the allocator, so we 
use the
_at__at_ -1228,6 +1236,7 _at__at_ keg_small_init(uma_keg_t keg)

      if (keg->uk_flags & UMA_ZONE_PCPU) {
          u_int ncpus = mp_ncpus ? mp_ncpus : MAXCPU;
+        ncpus = 1;

          keg->uk_slabsize = sizeof(struct pcpu);
          keg->uk_ppera = howmany(ncpus * sizeof(struct pcpu),
_at__at_ -1822,7 +1831,7 _at__at_ uma_startup(void *bootmem, int boot_pages)
  #endif
      args.name = "UMA Zones";
      args.size = sizeof(struct uma_zone) +
-        (sizeof(struct uma_cache) * (mp_maxid + 1));
+        (sizeof(struct uma_cache) * (0 + 1));
      args.ctor = zone_ctor;
      args.dtor = zone_dtor;
      args.uminit = zero_init;
_at__at_ -3301,7 +3310,7 _at__at_ uma_zero_item(void *item, uma_zone_t zone)
  {

      if (zone->uz_flags & UMA_ZONE_PCPU) {
-        for (int i = 0; i < mp_ncpus; i++)
+        for (int i = 0; i < 1; i++)
              bzero(zpcpu_get_cpu(item, i), zone->uz_size);
      } else
          bzero(item, zone->uz_size);
_at__at_ -3465,7 +3474,7 _at__at_ sysctl_vm_zone_stats(SYSCTL_HANDLER_ARGS)
       */
      bzero(&ush, sizeof(ush));
      ush.ush_version = UMA_STREAM_VERSION;
-    ush.ush_maxcpus = (mp_maxid + 1);
+    ush.ush_maxcpus = (0 + 1);
      ush.ush_count = count;
      (void)sbuf_bcat(&sbuf, &ush, sizeof(ush));

_at__at_ -3509,7 +3518,7 _at__at_ sysctl_vm_zone_stats(SYSCTL_HANDLER_ARGS)
               * accept the possible race associated with bucket
               * exchange during monitoring.
               */
-            for (i = 0; i < (mp_maxid + 1); i++) {
+            for (i = 0; i < (0 + 1); i++) {
                  bzero(&ups, sizeof(ups));
                  if (kz->uk_flags & UMA_ZFLAG_INTERNAL)
                      goto skip;
diff --git a/sys/vm/uma_int.h b/sys/vm/uma_int.h
index 11ab24f..b5b5a05 100644
--- a/sys/vm/uma_int.h
+++ b/sys/vm/uma_int.h
_at__at_ -107,7 +107,7 _at__at_
  #define UMA_SLAB_MASK    (PAGE_SIZE - 1)    /* Mask to get back to the 
page */
  #define UMA_SLAB_SHIFT    PAGE_SHIFT    /* Number of bits PAGE_MASK */

-#define UMA_BOOT_PAGES        64    /* Pages allocated for startup */
+#define UMA_BOOT_PAGES        512    /* Pages allocated for startup */

  /* Max waste percentage before going to off page slab management */
  #define UMA_MAX_WASTE    10


And lastly, the more verbose log:

Copyright (c) 1992-2015 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
     The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 11.0-CURRENT #336 52fd91e(smp_48)-dirty: Fri May 15 18:57:05 
CEST 2015
mst_at_arm64-prime:/usr/home/mst/freebsd_v8/obj_kernel/arm64.aarch64/usr/home/mst/freebsd_v8/kernel/sys/THUNDER-88XX 
arm64
FreeBSD clang version 3.6.0 (tags/RELEASE_360/final 230434) 20150225
MST: in vm_mem_init()
Creating uma keg headers zone and keg.
UMA: UMA Kegs(0xffffff8000d1b140) size 256(256) flags 0x20000000 ipers 
15 ppera 1 out 0 free 0
Filling boot free list.
Creating uma zone headers zone and keg.
INTERNAL: Allocating one item from UMA Kegs(0xffffff8000d1b140)
alloc_slab:  Allocating a new slab for UMA Kegs
UMA: UMA Zones(0xffffff8000d1b000) size 1856(1856) flags 0x20000000 
ipers 2 ppera 1 out 0 free 0
Creating slab and hash zones.
INTERNAL: Allocating one item from UMA Zones(0xffffff8000d1b000)
alloc_slab:  Allocating a new slab for UMA Zones
INTERNAL: Allocating one item from UMA Kegs(0xffffff8000d1b140)
UMA: UMA Slabs(0xffffffc0789fe000) size 112(112) flags 0x20000000 ipers 
35 ppera 1 out 0 free 0
INTERNAL: Allocating one item from UMA Zones(0xffffff8000d1b000)
INTERNAL: Allocating one item from UMA Kegs(0xffffff8000d1b140)
UMA: UMA RCntSlabs(0xffffffc0789fe740) size 120(120) flags 0x20000000 
ipers 33 ppera 1 out 0 free 0
INTERNAL: Allocating one item from UMA Zones(0xffffff8000d1b000)
alloc_slab:  Allocating a new slab for UMA Zones
INTERNAL: Allocating one item from UMA Kegs(0xffffff8000d1b140)
UMA: UMA Hash(0xffffffc0789fd000) size 256(256) flags 0x20000000 ipers 
15 ppera 1 out 0 free 0
INTERNAL: Allocating one item from UMA Zones(0xffffff8000d1b000)
INTERNAL: Allocating one item from UMA Kegs(0xffffff8000d1b140)
UMA: 4 Bucket(0xffffffc0789fd740) size 32(32) flags 0x10000040 ipers 124 
ppera 1 out 0 free 0
INTERNAL: Allocating one item from UMA Zones(0xffffff8000d1b000)
alloc_slab:  Allocating a new slab for UMA Zones
INTERNAL: Allocating one item from UMA Kegs(0xffffff8000d1b140)
UMA: 6 Bucket(0xffffffc0789fc000) size 48(48) flags 0x10000040 ipers 83 
ppera 1 out 0 free 0
INTERNAL: Allocating one item from UMA Zones(0xffffff8000d1b000)
INTERNAL: Allocating one item from UMA Kegs(0xffffff8000d1b140)
UMA: 8 Bucket(0xffffffc0789fc740) size 64(64) flags 0x10000040 ipers 62 
ppera 1 out 0 free 0
INTERNAL: Allocating one item from UMA Zones(0xffffff8000d1b000)
alloc_slab:  Allocating a new slab for UMA Zones
INTERNAL: Allocating one item from UMA Kegs(0xffffff8000d1b140)
UMA: 12 Bucket(0xffffffc0789fb000) size 96(96) flags 0x10000040 ipers 41 
ppera 1 out 0 free 0
INTERNAL: Allocating one item from UMA Zones(0xffffff8000d1b000)
INTERNAL: Allocating one item from UMA Kegs(0xffffff8000d1b140)
UMA: 16 Bucket(0xffffffc0789fb740) size 128(128) flags 0x10000040 ipers 
31 ppera 1 out 0 free 0
INTERNAL: Allocating one item from UMA Zones(0xffffff8000d1b000)
alloc_slab:  Allocating a new slab for UMA Zones
INTERNAL: Allocating one item from UMA Kegs(0xffffff8000d1b140)
UMA: 32 Bucket(0xffffffc0789fa000) size 256(256) flags 0x10000040 ipers 
15 ppera 1 out 0 free 0
INTERNAL: Allocating one item from UMA Zones(0xffffff8000d1b000)
INTERNAL: Allocating one item from UMA Kegs(0xffffff8000d1b140)
UMA: 64 Bucket(0xffffffc0789fa740) size 512(512) flags 0x10000040 ipers 
7 ppera 1 out 0 free 0
INTERNAL: Allocating one item from UMA Zones(0xffffff8000d1b000)
alloc_slab:  Allocating a new slab for UMA Zones
INTERNAL: Allocating one item from UMA Kegs(0xffffff8000d1b140)
UMA decided we need offpage slab headers for keg: 128 Bucket, calculated 
wastedspace = 912, maximum wasted space allowed = 409, calculated ipers 
= 4, new wasted space = 0
INTERNAL: Allocating one item from UMA Hash(0xffffffc0789fd000)
alloc_slab:  Allocating a new slab for UMA Hash
UMA: 128 Bucket(0xffffffc0789f9000) size 1024(1024) flags 0x10000148 
ipers 4 ppera 1 out 0 free 0
INTERNAL: Allocating one item from UMA Zones(0xffffff8000d1b000)
INTERNAL: Allocating one item from UMA Kegs(0xffffff8000d1b140)
UMA decided we need offpage slab headers for keg: 256 Bucket, calculated 
wastedspace = 1936, maximum wasted space allowed = 409, calculated ipers 
= 2, new wasted space = 0
INTERNAL: Allocating one item from UMA Hash(0xffffffc0789fd000)
UMA: 256 Bucket(0xffffffc0789f9740) size 2048(2048) flags 0x10000148 
ipers 2 ppera 1 out 0 free 0
UMA startup complete.
INTERNAL: Allocating one item from UMA Zones(0xffffff8000d1b000)
alloc_slab:  Allocating a new slab for UMA Zones
INTERNAL: Allocating one item from UMA Kegs(0xffffff8000d1b140)
UMA: vmem btag(0xffffffc0789f7000) size 56(56) flags 0x80000080 ipers 71 
ppera 1 out 0 free 0
alloc_slab:  Allocating a new slab for vmem btag
INTERNAL: Allocating one item from UMA Zones(0xffffff8000d1b000)
INTERNAL: Allocating one item from UMA Kegs(0xffffff8000d1b140)
UMA: VM OBJECT(0xffffffc0789f7740) size 256(256) flags 0x20 ipers 15 
ppera 1 out 0 free 0
INTERNAL: Allocating one item from UMA Zones(0xffffff8000d1b000)
alloc_slab:  Allocating a new slab for UMA Zones
INTERNAL: Allocating one item from UMA Kegs(0xffffff8000d1b140)
alloc_slab:  Allocating a new slab for UMA Kegs
UMA: RADIX NODE(0xffffffc0789f5000) size 144(144) flags 0x80000080 ipers 
27 ppera 1 out 0 free 0
INTERNAL: Allocating one item from UMA Zones(0xffffff8000d1b000)
INTERNAL: Allocating one item from UMA Kegs(0xffffff8000d1b140)
UMA: MAP(0xffffffc0789f5740) size 240(240) flags 0x20 ipers 16 ppera 1 
out 0 free 0
alloc_slab:  Allocating a new slab for MAP
INTERNAL: Allocating one item from UMA Zones(0xffffff8000d1b000)
alloc_slab:  Allocating a new slab for UMA Zones
INTERNAL: Allocating one item from UMA Kegs(0xffffff8000d1b140)
UMA: KMAP ENTRY(0xffffffc0789f2000) size 128(128) flags 0x800000c0 ipers 
31 ppera 1 out 0 free 0
INTERNAL: Allocating one item from UMA Zones(0xffffff8000d1b000)
INTERNAL: Allocating one item from UMA Kegs(0xffffff8000d1b140)
UMA: MAP ENTRY(0xffffffc0789f2740) size 128(128) flags 0 ipers 31 ppera 
1 out 0 free 0
INTERNAL: Allocating one item from UMA Zones(0xffffff8000d1b000)
alloc_slab:  Allocating a new slab for UMA Zones
INTERNAL: Allocating one item from UMA Kegs(0xffffff8000d1b140)
UMA: VMSPACE(0xffffffc0789f1000) size 384(384) flags 0x20 ipers 10 ppera 
1 out 0 free 0
Allocating one item from MAP(0xffffffc0789f5740)
INTERNAL: Allocating one item from MAP(0xffffffc0789f5740)
Allocating one item from KMAP ENTRY(0xffffffc0789f2000)
INTERNAL: Allocating one item from KMAP ENTRY(0xffffffc0789f2000)
alloc_slab:  Allocating a new slab for KMAP ENTRY
MST: in vmem_init() with param *vm == kernel_arena
MST: in vmem_xalloc() with param *vm == kernel_arena
Allocating one item from vmem btag(0xffffffc0789f7000)
INTERNAL: Allocating one item from vmem btag(0xffffffc0789f7000)
Allocating one item from vmem btag(0xffffffc0789f7000)
INTERNAL: Allocating one item from vmem btag(0xffffffc0789f7000)
alloc_slab:  Allocating a new slab for vmem btag
MST: in vmem_xalloc() with param *vm == kmem_arena
panic: mtx_lock() of spin mutex (null) _at_ 
/usr/home/mst/freebsd_v8/kernel/sys/kern/subr_vmem.c:1165
cpuid = 0
KDB: enter: panic
[ thread pid 0 tid 0 ]
Stopped at      0xffffff80001f4f80:
db>

Best regards,
Michal Stanek
Received on Sun May 17 2015 - 09:10:05 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:57 UTC