Re: svn commit: r360233 - in head: contrib/jemalloc . . . : This partially breaks a 2-socket 32-bit powerpc (old PowerMac G4) based on head -r360311

From: Mark Millard <marklmi_at_yahoo.com> Date: Tue, 12 May 2020 21:52:06 -0700 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:24 UTC

[Yet another new kind of experiment. But this looks
like I can cause problems in fairly sort order on
demand now. Finally! And with that I've much better
evidence for kernel vs. user-space process for making
the zeroed memory appear in, for example, nfsd.]

I've managed to get:

<jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/arena_inlines_b.h:258: Failed assertion: "slab == extent_slab_get(extent)"
<jemalloc>: /usr/src/contrib/jemalloc/include/jemalloc/internal/arena_inlines_b.h:258: Failed assertion: "slab == extent_slab_get(extent)"

and eventually:

[1]   Segmentation fault (core dumped) stress -m 2 --vm-bytes 1700M

from a user program (stress) while another machine was attempted an nfs
mount during the stress activity:

# mount -onoatime,soft ...:/ /mnt && umount /mnt && rpcinfo -s ...
[tcp] ...:/: RPCPROG_MNT: RPC: Timed out

(To get failure I may have to run the commands
multiple times. Timing details against stress's
activity seem to matter.)

That failure lead to:

# ls -ldT /*.core*
-rw-------  1 root  wheel  3899392 May 12 19:52:26 2020 /mountd.core

# ls -ldT *.core*
-rw-------  1 root  wheel  2682880 May 12 20:00:26 2020 stress.core

(Note which of nfsd, mountd, or rpcbind need not be
fully repeatable. stress.core seems to be written twice,
probably because of the -m 2 in use.)

The context that let me do this was to first (on the 2 socket
G4 with a full 2048 MiBYte RAM configuration):

stress -m 2 --vm-bytes 1700M &

Note that the stress command keeps the memory busy
and causes paging to the swap/page space. I've not
tried to make it just fit without paging or just
barely paging or such. The original context did not
involve paging or low RAM, so I do not expect paging
to be required but can be involved.

The stress program backtrace is different:

4827		return (tls_get_addr_slow(dtvp, index, offset));
4828	}
(gdb) bt -full
#0  0x41831b04 in tls_get_addr_common (dtvp=0x4186c010, index=2, offset=4294937444) at /usr/src/libexec/rtld-elf/rtld.c:4824
        dtv = 0x0
#1  0x4182bfcc in __tls_get_addr (ti=<optimized out>) at /usr/src/libexec/rtld-elf/powerpc/reloc.c:848
        tp = <optimized out>
        p = <optimized out>
#2  0x41a83464 in __get_locale () at /usr/src/lib/libc/locale/xlocale_private.h:199
No locals.
#3  fprintf (fp=0x41b355f8, fmt=0x1804cbc "%s: FAIL: [%lli] (%d) ") at /usr/src/lib/libc/stdio/fprintf.c:57
        ap = {{gpr = 2 '\002', fpr = 0 '\000', reserved = 20731, overflow_arg_area = 0xffffdb78, reg_save_area = 0xffffdae8}}
        ret = <optimized out>
#4  0x01802348 in main (argc=<optimized out>, argv=<optimized out>) at stress.c:415
        status = <optimized out>
        ret = 6
        do_dryrun = 0
        retval = 0
        children = 1
        do_backoff = <optimized out>
        do_hdd_bytes = <optimized out>
        do_hdd = <optimized out>
        do_vm_keep = 0
        do_vm_hang = -1
        do_vm_stride = 4096
        do_vm_bytes = 1782579200
        do_vm = 108174317627375616
        do_io = <optimized out>
        do_cpu = <optimized out>
        do_timeout = 108176117243859333
        starttime = 1589338322
        i = <optimized out>
        forks = <optimized out>
        pid = 6140
        stoptime = <optimized out>
        runtime = <optimized out>

Apparently the asserts did not stop the code
and it ran until a failure occurred (via
dtv=0x0).

Stress uses a mutex stored on a page that gets
the "turns into zeros" problem, preventing
the mprotect(ADDR,1,1) type of test because
stress will write on the page. (I've not tried
to find a minimal form of stress run.)

The the same sort of globals are again
zeroed, such as:

(gdb) print/x __je_sz_size2index_tab
$1 = {0x0 <repeats 513 times>}

Another attempt lost rpcbind instead instead of
mountd:

# ls -ldT /*.core
-rw-------  1 root  wheel  3899392 May 12 19:52:26 2020 /mountd.core
-rw-------  1 root  wheel  3170304 May 12 20:03:00 2020 /rpcbind.core

I again find that when I use gdb 3 times
to:

attach ???
x/x __je_sz_size2index_tab
print (int)mprotext(ADDRESS,1,1)
quit

for each of rpcbind, mountd, and nfsd master that
those processes no longer fail during the
mount/umount/rpcinfo (or are far less likely to).

But it turns out that later when I "service nfsd
stop" nfsd does get the zeroed Memory based assert
and core dumps. (I'd done a bunch of the mount/umount/
rpcinfo sequences before the stop.)

That the failure is during SIGUSR1 based shutdown,
leads me to wonder if killing off some child
process(es) is involved in the problem.

There was *no* evidence of a signal for an attempt
to write the page from the user process. It appears
that the kernel is doing something that changes what
the process sees --instead of the user-space programs
stomping on its own memory content.

I've no clue how to track down the kernel activity
that changes what the process sees on some page(s)
of memory.

(Prior testing with a debug kernel did not report
problems, despite getting an example failure. So
that seems insufficient.)

At least a procedure is now known that does not
involved waiting hours or days.

The procedure (adjusted for how much RAM is present
and number of cpus/cores?) could be appropriate to
run in other contexts than the 32-bit powerpc G4.
Part of the context likely should be not using
MALLOC_PRODUCTION --so problems would be detected
sooner via the asserts in jemalloc.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)