Re: FYI: example "panic: ARM64TODO: reclaim_pv_chunk" on a Pine64+ 2GB with head -r317015 [mkimg okay; "cp -p" generates the bad context]

From: Mark Millard <markmi_at_dsl-only.net>
Date: Sun, 30 Apr 2017 18:31:03 -0700
On 2017-Apr-27, at 10:26 PM, Mark Millard <markmi at dsl-only.net> wrote:

> [As the text does not really follow from the earlier text
> I'd sent directly I'm top posting a hypothesis about where
> so much active memory came from to be so low in available
> memory to get an reclaim_pv_chunk attempt.]
> 
> My hypothesis is that the "mdconfig -d"s from vm_copy_base
> (from /usr/src/release/tools/vmimage.subr ) did not actually
> release the memory resources involved (from vnode backed
> mdconfig use):

I watched with "vmstat 1" and "mdconfig -d" did release memory
(including RAM) each time.

> . . .
> From the prior top report for the failure,
> partially repeated here:
> 
> . . .
> Mem: 1618M Active, 17M Inact, 315M Wired, 204M Buf, 15M Free
> Swap: 6144M Total, 34M Used, 6110M Free, 348K Out
> 
>  PID USERNAME    THR PRI NICE   SIZE    RES   SWAP STATE   C   TIME     CPU COMMAND
> 48988 root          4  31    0   651M 27048K     0K RUN     0   0:03  87.60% xz -T 0 /usr/obj/DESTDIRs/vmimage-aarch64/vmimages/FreeBSD-12.0-CURRENT-
> . . .
> 
> The combination 1618M Mem:Active but 34M Swap:Used
> and 651M xz:SIZE but 27M xz:RES and 0K xz:SWAP just
> seems very odd, like it should not happen. The 17M
> Mem:Inact is odd for the context as well. (Mem:Wired,
> Mem:Buf, and Mem:Free look normal.)
> 
> An alternate hypothesis would be the memory "leak"
> is from mkimg not having it memory-use cleaned up.
> This happens after vm_copy_base but before the cp/xz
> sequence and is what produces vm.raw.

mkimg also release its memory (including RAM)
each time.

But the later "cp -p" of the large vm.raw that
I was producing ended up without leaving the
free memory that is expected after it finished.
I worked around the issue with (just a personal
workaround that helps in other respects as well):

Index: Makefile.vm
===================================================================
--- Makefile.vm	(revision 317015)
+++ Makefile.vm	(working copy)
_at__at_ -119,15 +119,20 _at__at_
 vm-install:
 .if defined(WITH_VMIMAGES) && !empty(WITH_VMIMAGES)
 	mkdir -p ${DESTDIR}/vmimages
-. for FORMAT in ${VMFORMATS}
-	cp -p ${VMBASE}.${FORMAT} \
-		${DESTDIR}/vmimages/${OSRELEASE}.${FORMAT}
-. endfor
 . if defined(WITH_COMPRESSED_VMIMAGES) && !empty(WITH_COMPRESSED_VMIMAGES)
 .  for FORMAT in ${VMFORMATS}
-	# Don't keep the originals.  There is a copy in ${.OBJDIR} if needed.
-	${XZ_CMD} ${DESTDIR}/vmimages/${OSRELEASE}.${FORMAT}
+	# Tradeoff "cp -p" property for less memory, I/O, and time use.
+	# Also: -3 got > 30 MiByte/sec effective (source file) for 32 GiByte, mostly-zero image
+	# on a Pine64+ 2GB and used about 600 MiBytes of "active virtual memory". The about 16
+	# minutes was vastly faster than the "cp -p" --and the sha512 and sha256 are vastly
+	# faster on the compressed file as well.
+	${XZ_CMD} -T 0 -3 --stdout --memlimit-compress=50% -v ${VMBASE}.${FORMAT} > ${DESTDIR}/vmimages/${OSRELEASE}.${FORMAT}.xz
 .  endfor
+. else
+.  for FORMAT in ${VMFORMATS}
+	cp -p ${VMBASE}.${FORMAT} \
+		${DESTDIR}/vmimages/${OSRELEASE}.${FORMAT}
+.  endfor
 . endif
 	cd ${DESTDIR}/vmimages && sha512 ${OSRELEASE}* > \
 		${DESTDIR}/vmimages/CHECKSUM.SHA512

and using WITH_COMPRESSED_VMIMAGES to avoid a "cp -p"
based copy of the large file.


Prior reports from capturing text:

On 2017-Apr-27, at 7:31 PM, Mark Millard <markmi at dsl-only.net> wrote:

> [Another example panic. Again no dump. But I have what
> a top -PCwaopid froze at this time.]
> 
> On 2017-Apr-27, at 4:22 PM, Mark Millard <markmi at dsl-only.net> wrote:
> 
>> Unfortunately for this FYI the attempt to produce a dump
>> failed and so all the information that I have is what I
>> first captured from the console output: a backtrace.
>> 
>> The context was head -r317015 on a Pine64+ 2GB. At the
>> time I was experimenting with trying to build a vm.raw
>> from my own build of FreeBSD. The (root) file system
>> is on a USB SSD off of a powered USB hub.
>> 
>> panic: ARM64TODO: reclaim_pv_chunk
>> cpuid = 1
>> time = 1493332968
>> KDB: stack backtrace:
>> db_trace_self() at db_trace_self_wrapper+0x28
>>       pc = 0xffff000000605cc0  lr = 0xffff0000000869cc
>>       sp = 0xffff000083ba4f00  fp = 0xffff000083ba5110
>> 
>> db_trace_self_wrapper() at vpanic+0x164
>>       pc = 0xffff0000000869cc  lr = 0xffff00000031d464
>>       sp = 0xffff000083ba5120  fp = 0xffff000083ba5190
>> 
>> vpanic() at panic+0x4c
>>       pc = 0xffff00000031d464  lr = 0xffff00000031d2fc
>>       sp = 0xffff000083ba51a0  fp = 0xffff000083ba5220
>> 
>> panic() at reclaim_pv_chunk+0x10
>>       pc = 0xffff00000031d2fc  lr = 0xffff00000061a234
>>       sp = 0xffff000083ba5230  fp = 0xffff000083ba5230
>> 
>> reclaim_pv_chunk() at get_pv_entry+0x240
>>       pc = 0xffff00000061a234  lr = 0xffff000000616184
>>       sp = 0xffff000083ba5240  fp = 0xffff000083ba5260
>> 
>> get_pv_entry() at pmap_enter+0x694
>>       pc = 0xffff000000616184  lr = 0xffff0000006156a0
>>       sp = 0xffff000083ba5270  fp = 0xffff000083ba5300
>> 
>> pmap_enter() at vm_fault_hold+0x28c
>>       pc = 0xffff0000006156a0  lr = 0xffff0000005b9740
>>       sp = 0xffff000083ba5310  fp = 0xffff000083ba5460
>> 
>> vm_fault_hold() at vm_fault+0x70
>>       pc = 0xffff0000005b9740  lr = 0xffff0000005b9464
>>       sp = 0xffff000083ba5470  fp = 0xffff000083ba54a0
>> 
>> vm_fault() at data_abort+0xe0
>>       pc = 0xffff0000005b9464  lr = 0xffff00000061ad94
>>       sp = 0xffff000083ba54b0  fp = 0xffff000083ba5560
>> 
>> data_abort() at handle_el1h_sync+0x70
>>       pc = 0xffff00000061ad94  lr = 0xffff000000607870
>>       sp = 0xffff000083ba5570  fp = 0xffff000083ba5680
>> 
>> handle_el1h_sync() at kern_select+0x9fc
>>       pc = 0xffff000000607870  lr = 0xffff00000037db3c
>>       sp = 0xffff000083ba5690  fp = 0xffff000083ba58f0
>> 
>> kern_select() at sys_select+0x5c
>>       pc = 0xffff00000037db3c  lr = 0xffff00000037dc58
>>       sp = 0xffff000083ba5900  fp = 0xffff000083ba5930
>> 
>> sys_select() at do_el0_sync+0xa48
>>       pc = 0xffff00000037dc58  lr = 0xffff00000061b91c
>>       sp = 0xffff000083ba5940  fp = 0xffff000083ba5a70
>> 
>> do_el0_sync() at handle_el0_sync+0x6c
>>       pc = 0xffff00000061b91c  lr = 0xffff0000006079e8
>>       sp = 0xffff000083ba5a80  fp = 0xffff000083ba5b90
>> 
>> handle_el0_sync() at 0x4948c
>>       pc = 0xffff0000006079e8  lr = 0x000000000004948c
>>       sp = 0xffff000083ba5ba0  fp = 0x0000ffffffffd960
> 
> 
> This time I got to record from top:
> (swap is on a swap partition)
> (pid 49888's SIZE vs. RES and SWAP might be interesting)
> (as might the Active figure)
> 
> last pid: 48988;  load averages:  0.64,  0.44,  0.38                                                                                                                            up 0+04:21:01  19:19:50
> 32 processes:  2 running, 30 sleeping
> CPU 0: 13.2% user,  0.0% nice, 23.2% system,  0.3% interrupt, 63.3% idle
> CPU 1:  4.6% user,  0.0% nice, 23.9% system,  0.0% interrupt, 71.5% idle
> CPU 2:  2.1% user,  0.0% nice, 23.2% system,  0.0% interrupt, 74.8% idle
> CPU 3:  3.3% user,  0.0% nice, 23.8% system,  0.0% interrupt, 72.8% idle
> Mem: 1618M Active, 17M Inact, 315M Wired, 204M Buf, 15M Free
> Swap: 6144M Total, 34M Used, 6110M Free, 348K Out
> 
> PID USERNAME    THR PRI NICE   SIZE    RES   SWAP STATE   C   TIME     CPU COMMAND
> 48988 root          4  31    0   651M 27048K     0K RUN     0   0:03  87.60% xz -T 0 /usr/obj/DESTDIRs/vmimage-aarch64/vmimages/FreeBSD-12.0-CURRENT-arm64-aarch64.raw
> 11983 root          1  22    0  5068K     0K     0K wait    3   0:00   0.00% make vm-image vm-install DESTDIR=/usr/obj/DESTDIRs/vmimage-aarch64 (<make>)
> 11981 root          1  42    0  7320K     0K  1516K wait    1   0:00   0.00% sh /root/sys_build_scripts.aarch64-host/make_noscript_aarch64_nodebug_clang_bootstrap-aarch64-host.sh vm-image vm-install 
> 11980 root          1  20    0  6656K  1548K     0K select  0   0:02   0.00% [script]
> 11977 root          1  30    0  7320K     0K  1516K wait    3   0:00   0.00% /bin/sh /root/sys_build_scripts.aarch64-host/make_aarch64_nodebug_clang_bootstrap-aarch64-host.sh vm-image vm-install DEST
> 2694 root          1  20    0  8804K  2072K     0K CPU2    2   0:07   0.17% top -PCwaopid
> 827 root          1  20    0  7320K     0K   360K wait    0   0:00   0.00% su (<sh>)
> 826 markmi        1  22    0 10372K     0K  1532K wait    3   0:00   0.00% su (<su>)
> 820 markmi        1  24    0  7320K     0K  1516K wait    1   0:00   0.00% -sh (<sh>)
> 819 markmi        1  20    0 18416K  1152K     0K select  1   0:21   0.00% sshd: markmi_at_pts/1 (sshd)
> 816 root          1  20    0 18416K  3276K     0K select  0   0:00   0.00% sshd: markmi [priv] (sshd)
> 765 root          1  20    0  7320K     0K   224K wait    2   0:00   0.00% su (<sh>)
> 764 markmi        1  23    0 10372K     0K  1532K wait    0   0:00   0.00% su (<su>)
> 758 markmi        1  31    0  7320K     0K  1516K wait    1   0:00   0.00% -sh (<sh>)
> 757 markmi        1  20    0 18416K   228K   904K select  3   0:01   0.01% sshd: markmi_at_pts/0 (sshd)
> 754 root          1  25    0 18416K  3276K     0K select  1   0:00   0.00% sshd: markmi [priv] (sshd)
> 746 root          1  27    0  7320K  1532K     0K ttyin   0   0:00   0.00% -sh (sh)
> 745 root          1  20    0 10372K     0K  1532K wait    1   0:00   0.00% login [pam] (<login>)
> 700 root          1  20    0  6948K     0K   168K nanslp  1   0:00   0.00% /usr/sbin/cron -s (<cron>)
> 696 smmsp         1  20    0 10460K     0K   184K pause   0   0:00   0.00% sendmail: Queue runner_at_00:30:00 for /var/spool/clientmqueue (<sendmail>)
> 693 root          1  20    0 10460K  1392K     0K select  1   0:00   0.03% sendmail: accepting connections (sendmail)
> 690 root          1  20    0 15800K   968K     0K select  2   0:00   0.00% /usr/sbin/sshd
> 661 root          1  20    0  6656K   344K     0K select  2   0:01   0.00% /usr/sbin/powerd
> 658 root          2  20    0 12788K 12672K     0K select  0   0:02   0.01% /usr/sbin/ntpd -g -c /etc/ntp.conf -p /var/run/ntpd.pid -f /var/db/ntpd.drift
> 620 root         32  52    0  6384K  1100K     0K rpcsvc  1   0:00   0.00% nfsd: server (nfsd)
> 619 root          1  52    0  6384K   704K     0K select  1   0:00   0.00% nfsd: master (nfsd)
> 617 root          1  20    0  6684K   688K     0K select  1   0:00   0.00% /usr/sbin/mountd -r
> 478 root          1  20    0  6676K   596K     0K select  3   0:00   0.00% /usr/sbin/rpcbind
> 469 root          1  20    0  6680K   572K     0K select  2   0:00   0.00% /usr/sbin/syslogd -s
> 396 root          1  20    0  9580K    32K     0K select  0   0:00   0.00% /sbin/devd
> 308 _dhcp         1  20    0  6800K   532K     0K select  2   0:00   0.00% dhclient: awg0 (dhclient)
> 307 root          1  52    0  6800K   424K     0K select  2   0:00   0.00% dhclient: awg0 [priv] (dhclient)
> 
> And here is the backtrace:
> 
> timeout stopping cpus
> panic: ARM64TODO: reclaim_pv_chunk
> cpuid = 0
> time = 1493345992
> KDB: stack backtrace:
> db_trace_self() at db_trace_self_wrapper+0x28
>        pc = 0xffff000000605cc0  lr = 0xffff0000000869cc
>        sp = 0xffff000083d301d0  fp = 0xffff000083d303e0
> 
> db_trace_self_wrapper() at vpanic+0x164
>        pc = 0xffff0000000869cc  lr = 0xffff00000031d464
>        sp = 0xffff000083d303f0  fp = 0xffff000083d30460
> 
> vpanic() at panic+0x4c
>        pc = 0xffff00000031d464  lr = 0xffff00000031d2fc
>        sp = 0xffff000083d30470  fp = 0xffff000083d304f0
> 
> panic() at reclaim_pv_chunk+0x10
>        pc = 0xffff00000031d2fc  lr = 0xffff00000061a234
>        sp = 0xffff000083d30500  fp = 0xffff000083d30500
> 
> reclaim_pv_chunk() at get_pv_entry+0x240
>        pc = 0xffff00000061a234  lr = 0xffff000000616184
>        sp = 0xffff000083d30510  fp = 0xffff000083d30530
> 
> get_pv_entry() at pmap_enter+0x694
>        pc = 0xffff000000616184  lr = 0xffff0000006156a0
>        sp = 0xffff000083d30540  fp = 0xffff000083d305d0
> 
> pmap_enter() at vm_fault_hold+0x28c
>        pc = 0xffff0000006156a0  lr = 0xffff0000005b9740
>        sp = 0xffff000083d305e0  fp = 0xffff000083d30730
> 
> vm_fault_hold() at vm_fault+0x70
>        pc = 0xffff0000005b9740  lr = 0xffff0000005b9464
>        sp = 0xffff000083d30740  fp = 0xffff000083d30770
> 
> vm_fault() at data_abort+0xe0
>        pc = 0xffff0000005b9464  lr = 0xffff00000061ad94
>        sp = 0xffff000083d30780  fp = 0xffff000083d30830
> 
> data_abort() at handle_el0_sync+0x6c
>        pc = 0xffff00000061ad94  lr = 0xffff0000006079e8
>        sp = 0xffff000083d30840  fp = 0xffff000083d30950
> 
> handle_el0_sync() at 0x400a3de4
>        pc = 0xffff0000006079e8  lr = 0x00000000400a3de4
>        sp = 0xffff000083d30960  fp = 0x0000ffffbfdfcd30
> 
> KDB: enter: panic
> [ thread pid 48988 tid 100230 ]
> Stopped at      kdb_enter+0x44: undefined       d4200000
> 

===
Mark Millard
markmi at dsl-only.net

_______________________________________________
freebsd-arm_at_freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-arm
To unsubscribe, send any mail to "freebsd-arm-unsubscribe_at_freebsd.org"
Received on Sun Apr 30 2017 - 23:31:07 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:11 UTC