[The bit argument ot bitmap_unset seems to be way too large.] On 2020-May-3, at 11:08, Mark Millard <marklmi_at_ at yahoo.com> wrote: > [At around 4AM local time dhcient got a signal 11, > despite the jemalloc revert. The other exmaples > have not happened.] > > On 2020-May-2, at 18:46, Mark Millard <marklmi at yahoo.com> wrote: > >> [I'm only claiming the new jemalloc is involved and that >> reverting avoids the problem.] >> >> I've been reporting to some lists problems with: >> >> dhclient >> sendmail >> rpcbind >> mountd >> nfsd >> >> getting SIGSEGV (signal 11) crashes and some core >> dumps on the old 2-socket (1 core per socket) 32-bit >> PowerMac G4 running head -r360311. >> >> Mikaƫl Urankar sent a note suggesting that I try >> testing reverting head -r360233 for my head -r360311 >> context. He got it right . . . >> >> >> Context: >> >> The problem was noticed by an inability to have >> other machines do a: >> >> mount -onoatime,soft OLDPOWERMAC-LOCAL-IP:/... /mnt >> >> sort of operation and to have succeed. By contrast, on >> the old PowerMac G4 I could initiate mounts against >> other machines just fine. >> >> I do not see any such problems on any of (all based >> on head -r360311): >> >> powerpc64 (old PowerMac G5 2-sockets with 2 cores each) >> armv7 (OrangePi+ 2ed) >> aarch64 (Rock64, RPi4, RPi3, >> OverDrive 1000, >> Macchiatobin Double Shot) >> amd64 (ThreadRipper 1950X) >> >> So I expect something 32-bit powerpc specific >> is somehow involved, even if jemalloc is only >> using whatever it is. >> >> (A kyua run with a debug kernel did not find other >> unexpected signal 11 sources on the 32-bit PowerMac >> compared to past kyua runs, at least that I noticed. >> There were a few lock order reversals that I do not >> know if they are expected or known-safe or not. >> I've reported those reversals to the lists as well.) >> >> >> Recent experiments based on the suggestion: >> >> Doing the buildworld, buildkernel and installing just >> the new kernel and rebooting made no difference. >> >> But then installing the new world and rebooting did >> make things work again: I no longer get core files >> for the likes of (old cores from before the update): >> >> # find / -name "*.core" -print >> /var/spool/clientmqueue/sendmail.core >> /rpcbind.core >> /mountd.core >> /nfsd.core >> >> Nor do I see the various notices for sendmail >> signal 11's that did not leave behind a core file >> --or for dhclient (no core file left behind). >> And I can mount the old PowerMac's drive from >> other machines just fine. >> >> >> Other notes: >> >> I do not actively use sendmail but it was left >> to do its default things, partially to test if >> such default things are working. Unfortunately, >> PowerMacs have a problematical status under >> FreeBSD and my context has my historical >> experiments with avoiding various problems. > > Looking, I see that I got a: > > pid 572 (dhclient), jid 0, uid 0: exited on signal 11 (core dumped) > > notice under the reverted build. No instances > of the other examples. This is the first that a > dhclient example has produced a .core file. > > gdb indicates 0x5180936c for r7 in: > > lwz r8,36(r7) > > as leading to the failure. This was in > arena_dalloc_bin_locked_impl (where > arena_slab_reg_dalloc and bitmap_unset > were apparently inlined). > > The chain for the example seems to be: > fork_privchld -> dispatch_imsg -> jemalloc > > For reference . . . > > # gdb dhclient /dhclient.core > GNU gdb (GDB) 9.1 [GDB v9.1 for FreeBSD] > Copyright (C) 2020 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> > . . . > Reading symbols from dhclient... > Reading symbols from /usr/lib/debug//sbin/dhclient.debug... > [New LWP 100089] > Core was generated by `dhclient: gem0 [priv]'. > Program terminated with signal SIGSEGV, Segmentation fault. > #0 bitmap_unset (bitmap=0x50407164, binfo=<optimized out>, bit=167842154) at /usr/powerpc32_src/contrib/jemalloc/include/jemalloc/internal/bitmap.h:341 > 341 /usr/powerpc32_src/contrib/jemalloc/include/jemalloc/internal/bitmap.h: No such file or directory. > (gdb) bt -full > #0 bitmap_unset (bitmap=0x50407164, binfo=<optimized out>, bit=167842154) at /usr/powerpc32_src/contrib/jemalloc/include/jemalloc/internal/bitmap.h:341 > goff = <optimized out> > gp = 0x51809390 > propagate = <optimized out> > g = <optimized out> > i = <optimized out> > #1 arena_slab_reg_dalloc (slab=0x50407140, slab_data=0x50407164, ptr=0x50088b50) at jemalloc_arena.c:273 > bin_info = <optimized out> > binind = 0 > regind = 167842154 > #2 arena_dalloc_bin_locked_impl (tsdn=0x5009f018, arena=<optimized out>, slab=<optimized out>, ptr=<optimized out>, junked=<optimized out>) at jemalloc_arena.c:1540 > slab_data = <optimized out> > binind = <optimized out> > bin_info = <optimized out> > bin = <optimized out> > nfree = <optimized out> > #3 0x502916a8 in __je_arena_dalloc_bin_junked_locked (tsdn=<optimized out>, arena=<optimized out>, extent=<optimized out>, ptr=<optimized out>) at jemalloc_arena.c:1559 > No locals. > #4 0x50250d2c in __je_tcache_bin_flush_small (tsd=0x5009f018, tcache=<optimized out>, tbin=0x5009f1c0, binind=<optimized out>, rem=24) at jemalloc_tcache.c:149 > ptr = <optimized out> > i = 0 > extent = 0x50407140 > bin_arena = 0x50400380 > bin = <optimized out> > ndeferred = 0 > merged_stats = <optimized out> > arena = 0x50400380 > nflush = 75 > __vla_expr0 = <optimized out> > item_extent = 0xffffd1f0 > #5 0x502508a0 in __je_tcache_event_hard (tsd=<optimized out>, tcache=0x5009f108) at jemalloc_tcache.c:54 > tbin_info = <optimized out> > binind = 7 > tbin = 0x5009f1c0 > #6 0x5029a684 in __free (ptr=0x500530c0) at /usr/powerpc32_src/contrib/jemalloc/include/jemalloc/internal/rtree.h:374 > tcache = 0x5009f108 > tsd = <optimized out> > log_var = <optimized out> > log_var = <optimized out> > #7 0x10025994 in dispatch_imsg (ifix=<optimized out>, fd=10) at /usr/powerpc32_src/sbin/dhclient/privsep.c:215 > hdr = {code = IMSG_SCRIPT_WRITE_PARAMS, len = 3225} > lease = {next = 0x0, expiry = 1588504529, renewal = 1588504229, rebind = 1588504454, address = {len = 4, iabuf = "\300\250\001i", '\000' <repeats 11 times>}, nextserver = {len = 4, > iabuf = '\000' <repeats 15 times>}, server_name = 0x0, filename = 0x0, medium = 0x0, is_static = 0, is_bootp = 0, options = {{len = 0, data = 0x0}, {len = 4, > data = 0x500530c8 "\377\377\377"}, {len = 0, data = 0x0}, {len = 4, data = 0x500530d0 "\300\250\001\001"}, {len = 0, data = 0x0}, {len = 0, data = 0x0}, {len = 4, > data = 0x500530d8 "\300\250\001\001"}, {len = 0, data = 0x0}, {len = 0, data = 0x0}, {len = 0, data = 0x0}, {len = 0, data = 0x0}, {len = 0, data = 0x0}, {len = 0, data = 0x0}, { > len = 0, data = 0x0}, {len = 0, data = 0x0}, {len = 20, data = 0x50055200 "hsd1.or.comcast.net."}, {len = 0, data = 0x0} <repeats 35 times>, {len = 4, data = 0x500530e0 ""}, {len = 0, > data = 0x0}, {len = 1, data = 0x500530e8 "\005"}, {len = 4, data = 0x500530f0 "\300\250\001\001"}, {len = 0, data = 0x0} <repeats 201 times>}} > medium_len = <optimized out> > medium = <optimized out> > totlen = 3225 > filename_len = <optimized out> > filename = 0x0 > ret = <optimized out> > buf = <optimized out> > mtu = <optimized out> > servername_len = <optimized out> > servername = 0x0 > reason_len = <optimized out> > reason = <optimized out> > --Type <RET> for more, q to quit, c to continue without paging-- > prefix_len = <optimized out> > prefix = 0x500530c0 "new_" > i = 0 > optlen = 0 > #8 0x100189f4 in fork_privchld (fd=10, fd2=<optimized out>) at /usr/powerpc32_src/sbin/dhclient/dhclient.c:2847 > pfd = {{fd = 10, events = 1, revents = 1}} > nfds = <optimized out> > #9 0x10017a80 in main (argc=<optimized out>, argv=<optimized out>) at /usr/powerpc32_src/sbin/dhclient/dhclient.c:505 > pipe_fd = {10, 11} > rights = {cr_rights = {1342801412, 18446706484155777024}} > immediate_daemon = 0 > i = 0 > ch = <optimized out> > otherpid = 8 > pw = 0x5039b9d8 > fd = <optimized out> > capmode = <optimized out> > > (gdb) disass > Dump of assembler code for function arena_dalloc_bin_locked_impl: > 0x502916b8 <+0>: mflr r0 > 0x502916bc <+4>: stw r0,4(r1) > 0x502916c0 <+8>: stwu r1,-48(r1) > 0x502916c4 <+12>: stw r30,40(r1) > 0x502916c8 <+16>: stw r24,16(r1) > 0x502916cc <+20>: stw r25,20(r1) > 0x502916d0 <+24>: stw r26,24(r1) > 0x502916d4 <+28>: stw r27,28(r1) > 0x502916d8 <+32>: stw r28,32(r1) > 0x502916dc <+36>: stw r29,36(r1) > 0x502916e0 <+40>: bl 0x502916e4 <arena_dalloc_bin_locked_impl+44> > 0x502916e4 <+44>: mr r27,r3 > 0x502916e8 <+48>: mflr r30 > 0x502916ec <+52>: addis r30,r30,14 > 0x502916f0 <+56>: addi r30,r30,7788 > 0x502916f4 <+60>: mr r28,r4 > 0x502916f8 <+64>: lwz r4,5856(r30) > 0x502916fc <+68>: lwz r3,4(r5) > 0x50291700 <+72>: mr r29,r5 > 0x50291704 <+76>: andi. r5,r7,1 > 0x50291708 <+80>: mr r26,r6 > 0x5029170c <+84>: lbz r4,0(r4) > 0x50291710 <+88>: rlwinm r5,r3,14,25,31 > 0x50291714 <+92>: mulli r24,r5,224 > 0x50291718 <+96>: mulli r25,r5,44 > 0x5029171c <+100>: cmpwi cr1,r4,0 > 0x50291720 <+104>: cror 4*cr5+lt,4*cr1+eq,gt > 0x50291724 <+108>: bge cr5,0x50291a2c <arena_dalloc_bin_locked_impl+884> > 0x50291728 <+112>: lwz r4,0(r29) > 0x5029172c <+116>: lwz r6,6036(r30) > 0x50291730 <+120>: lwz r7,8(r29) > 0x50291734 <+124>: rlwinm r8,r5,2,0,29 > 0x50291738 <+128>: li r9,1 > 0x5029173c <+132>: add r24,r28,r24 > 0x50291740 <+136>: lwzx r6,r6,r8 > 0x50291744 <+140>: subf r7,r7,r26 > 0x50291748 <+144>: mulhwu r6,r6,r7 > 0x5029174c <+148>: rlwinm r7,r6,29,3,29 > 0x50291750 <+152>: add r7,r29,r7 > => 0x50291754 <+156>: lwz r8,36(r7) > 0x50291758 <+160>: clrlwi r10,r6,27 > 0x5029175c <+164>: slw r9,r9,r10 > 0x50291760 <+168>: xor r9,r9,r8 > 0x50291764 <+172>: cmplwi r8,0 > 0x50291768 <+176>: stw r9,36(r7) > 0x5029176c <+180>: bne 0x502917e4 <arena_dalloc_bin_locked_impl+300> > 0x50291770 <+184>: lwz r7,4408(r30) > 0x50291774 <+188>: mulli r8,r5,44 > 0x50291778 <+192>: add r5,r7,r8 > 0x5029177c <+196>: lwz r5,16(r5) > 0x50291780 <+200>: cmplwi r5,2 > 0x50291784 <+204>: blt 0x502917e4 <arena_dalloc_bin_locked_impl+300 > . . . > > (gdb) info reg > r0 0x502916a8 1344870056 > r1 0xffffd1a0 4294955424 > r2 0x500a6018 1342857240 > r3 0x0 0 > r4 0x0 0 > r5 0x0 0 > r6 0xa01116a 167842154 > r7 0x5180936c 1367380844 > r8 0x0 0 > r9 0x1 1 > r10 0x1e 30 > r11 0x5005d114 1342558484 > r12 0x84000c00 2214595584 > r13 0x0 0 > r14 0xffffd1f0 4294955504 > r15 0xfffffffc 4294967292 > r16 0x4a 74 > r17 0x4b 75 > r18 0x0 0 > r19 0x504009a0 1346374048 > r20 0x0 0 > r21 0xffffd1f0 4294955504 > r22 0x620 1568 > r23 0x50400380 1346372480 > r24 0x50400380 1346372480 > r25 0x0 0 > r26 0x50088b50 1342737232 > r27 0x5009f018 1342828568 > r28 0x50400380 1346372480 > r29 0x50407140 1346400576 > r30 0x50373550 1345795408 > r31 0xffffd310 4294955792 > pc 0x50291754 0x50291754 <arena_dalloc_bin_locked_impl+156> > msr <unavailable> > cr 0x42480c00 1112017920 > lr 0x502916e4 0x502916e4 <arena_dalloc_bin_locked_impl+44> > ctr 0x5005d114 1342558484 > xer 0x0 0 > fpscr 0x0 0 > vscr <unavailable> > vrsave <unavailable> bitmap_unset (bitmap=0x50407164, binfo=<optimized out>, bit=167842154) explains calculating: gp = 0x51809390 via bitmap+(bit/4/8): (gdb) print/x 0x50407164 +167842154/4/8 $16 = 0x51809390 The last potential bit/4/8 value to be able to access memory (without spanning a hole) is: (gdb) print *(bitmap+582566) $13 = 0 (gdb) print/x (bitmap+582566) $14 = 0x5063fffc So it looks like arena_slab_reg_dalloc produced an invalid bit value. Looking at that code shows that regind hold the parameter value that matches: static void arena_slab_reg_dalloc(extent_t *slab, arena_slab_data_t *slab_data, void *ptr) { szind_t binind = extent_szind_get(slab); const bin_info_t *bin_info = &bin_infos[binind]; size_t regind = arena_slab_regind(slab, binind, ptr); assert(extent_nfree_get(slab) < bin_info->nregs); /* Freeing an unallocated pointer can cause assertion failure. */ assert(bitmap_get(slab_data->bitmap, &bin_info->bitmap_info, regind)); bitmap_unset(slab_data->bitmap, &bin_info->bitmap_info, regind); extent_nfree_inc(slab); } The backtrace showed binind==0 for arena_slab_reg_dalloc. That leaves: arena_slab_regind(slab, binind, ptr) as producing the odd value. size_t arena_slab_regind(extent_t *slab, szind_t binind, const void *ptr) { size_t diff, regind; /* Freeing a pointer outside the slab can cause assertion failure. */ assert((uintptr_t)ptr >= (uintptr_t)extent_addr_get(slab)); assert((uintptr_t)ptr < (uintptr_t)extent_past_get(slab)); /* Freeing an interior pointer can cause assertion failure. */ assert(((uintptr_t)ptr - (uintptr_t)extent_addr_get(slab)) % (uintptr_t)bin_infos[binind].reg_size == 0); diff = (size_t)((uintptr_t)ptr - (uintptr_t)extent_addr_get(slab)); /* Avoid doing division with a variable divisor. */ regind = div_compute(&arena_binind_div_info[binind], diff); assert(regind < bin_infos[binind].nregs); return regind; } ptr == 0x50088b50 slab == 0x50407140 static inline void * extent_addr_get(const extent_t *extent) { assert(extent->e_addr == PAGE_ADDR2BASE(extent->e_addr) || !extent_slab_get(extent)); return extent->e_addr; } (gdb) print *slab $17 = {e_bits = 0, e_addr = 0x0, {e_size_esn = 0, e_bsize = 0}, ql_link = {qre_next = 0x0, qre_prev = 0x0}, ph_link = {phn_prev = 0x0, phn_next = 0x0, phn_lchild = 0x0}, {e_slab_data = {bitmap = { 0 <repeats 17 times>}}, e_prof_tctx = {repr = 0x0}}} That looks wrong: all fields are zero, which is not likely to be the description of a slab. But I'll continue to be sure I get the reported value of bit. So extent_addr_get(slab)==slab->e_addr and slab->e_addr==0x0 and diff==ptr . (gdb) print/x arena_binind_div_info[binind] $19 = {magic = 0x20000000} static inline size_t div_compute(div_info_t *div_info, size_t n) { assert(n <= (uint32_t)-1); /* * This generates, e.g. mov; imul; shr on x86-64. On a 32-bit machine, * the compilers I tried were all smart enough to turn this into the * appropriate "get the high 32 bits of the result of a multiply" (e.g. * mul; mov edx eax; on x86, umull on arm, etc.). */ size_t i = ((uint64_t)n * (uint64_t)div_info->magic) >> 32; #ifdef JEMALLOC_DEBUG assert(i * div_info->d == n); #endif return i; } (gdb) print/x ((unsigned long long)0x50088b50 * (unsigned long long)0x20000000) >> 32 $21 = 0xa01116a (gdb) print ((unsigned long long)0x50088b50 * (unsigned long long)0x20000000) >> 32 $22 = 167842154 (As reported.) So returning to *slab being all zero . . . The slab value in the call chain seems to trace back to= __je_tcache_bin_flush_small code: bin_t *bin = &bin_arena->bins[binind]; . . . malloc_mutex_lock(tsd_tsdn(tsd), &bin->lock); . . . for (unsigned i = 0; i < nflush; i++) { void *ptr = *(tbin->avail - 1 - i); extent = item_extent[i]; assert(ptr != NULL && extent != NULL); if (extent_arena_get(extent) == bin_arena) { arena_dalloc_bin_junked_locked(tsd_tsdn(tsd), bin_arena, extent, ptr); . . . malloc_mutex_unlock(tsd_tsdn(tsd), &bin->lock); (So ptr's value here is later slab's value in the call chain.) The backtrace shows binind = 7 via __je_tcache_event_hard . (Not the same as the earlier binind.) #4 0x50250d2c in __je_tcache_bin_flush_small (tsd=0x5009f018, tcache=<optimized out>, tbin=0x5009f1c0, binind=<optimized out>, rem=24) at jemalloc_tcache.c:149 ptr = <optimized out> i = 0 extent = 0x50407140 bin_arena = 0x50400380 bin = <optimized out> ndeferred = 0 merged_stats = <optimized out> arena = 0x50400380 nflush = 75 __vla_expr0 = <optimized out> item_extent = 0xffffd1f0 (gdb) print/x bin_arena->bins[7] $44 = {lock = {{{prof_data = {tot_wait_time = {ns = 0x0}, max_wait_time = {ns = 0x0}, n_wait_times = 0x0, n_spin_acquired = 0x0, max_n_thds = 0x0, n_waiting_thds = {repr = 0x0}, n_owner_switches = 0x0, prev_owner = 0x0, n_lock_ops = 0x0}, lock = 0x0, postponed_next = 0x504021d0}, witness = {name = 0x0, rank = 0x0, comp = 0x0, opaque = 0x0, link = {qre_next = 0x0, qre_prev = 0x0}}, lock_order = 0x0}}, slabcur = 0x50407140, slabs_nonfull = {ph_root = 0x0}, slabs_full = {qlh_first = 0x0}, stats = {nmalloc = 0x64, ndalloc = 0x0, nrequests = 0x1, curregs = 0x64, nfills = 0x1, nflushes = 0x1, nslabs = 0x1, reslabs = 0x0, curslabs = 0x1, mutex_data = {tot_wait_time = {ns = 0x0}, max_wait_time = {ns = 0x0}, n_wait_times = 0x0, n_spin_acquired = 0x0, max_n_thds = 0x0, n_waiting_thds = {repr = 0x0}, n_owner_switches = 0x0, prev_owner = 0x0, n_lock_ops = 0x0}}} That indicates: bin_arena->bins[7]->lock = 0x0 . Expected? Single threaded context? (gdb) print *item_extent[0] $27 = {e_bits = 0, e_addr = 0x0, {e_size_esn = 0, e_bsize = 0}, ql_link = {qre_next = 0x0, qre_prev = 0x0}, ph_link = {phn_prev = 0x0, phn_next = 0x0, phn_lchild = 0x0}, {e_slab_data = {bitmap = { 0 <repeats 17 times>}}, e_prof_tctx = {repr = 0x0}}} Other *item_extent[INDEX] that I tried got the same: all zeros. This is what contributed to the huge bit value. item_extent[] is based on the declaration: VARIABLE_ARRAY(extent_t *, item_extent, nflush); and: /* Declare a variable-length array. */ #if __STDC_VERSION__ < 199901L # ifdef _MSC_VER # include <malloc.h> # define alloca _alloca # else # ifdef JEMALLOC_HAS_ALLOCA_H # include <alloca.h> # else # include <stdlib.h> # endif # endif # define VARIABLE_ARRAY(type, name, count) \ type *name = alloca(sizeof(type) * (count)) #else # define VARIABLE_ARRAY(type, name, count) type name[(count)] #endif WARNING: C11 turned VLAs into a conditional feature (__STDC_NO_VLA__). Only C99 has it as required. Thus the above definition of VARIABLE_ARRAY is incomplete or limited to C99 and before relative the the language vintages. Looking around, the stack frames seem to span the space okay: (gdb) print/x &item_extent[75] $32 = 0xffffd31c (gdb) print/x &item_extent[0] $33 = 0xffffd1f0 r1 0xffffd1a0 4294955424 r14 0xffffd1f0 4294955504 r15 0xfffffffc 4294967292 r21 0xffffd1f0 4294955504 (gdb) print/x *(void**)0xffffd1a0 $36 = 0xffffd1d0 (gdb) print/x *(void**)0xffffd1d0 $37 = 0xffffd1e0 (gdb) print/x *(void**)0xffffd1e0 $38 = 0xffffd440 (gdb) print/x *(void**)0xffffd440 $39 = 0xffffd460 And I've run out of ideas for what else to look at (for now). (It is not like I understand jemalloc.) (Last I knew, 32-bit powerpc did not have red-zone stack-space criteria to leave room for signals to use.) === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)Received on Sun May 03 2020 - 19:32:41 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:23 UTC