On 2016-Nov-7, at 8:54 PM, Mark Millard <markmi at dsl-only.net> wrote: > This is just in case any of the information happens to prove useful/interesting. > I'm not expecting any assistance. > > Note: After the crash ddb was not responding to input so this is all that I have. > > Note: This was an experiment with head -r308247 but was built like stable for > performance issues (GENERIC included and then overridden, not GENERIC-UP > based). > > The below was found via dmesg and/or /var/log/messages content while the Pine64 > was busy building lang/gcc6 and its prerequisites (as a stability test). > > It got lots of spurious interrupt notices per second, such as: > >> gic0: Spurious interrupt detected: last irq: 27 on CPU0 >> gic0: Spurious interrupt detected: last irq: 27 on CPU2 >> gic0: Spurious interrupt detected: last irq: 106 on CPU3 >> gic0: Spurious interrupt detected: last irq: 27 on CPU1 >> gic0: Spurious interrupt detected: last irq: 27 on CPU1 >> gic0: gic0: Spurious interrupt detected: last irq: 27 on CPU1 >> Spurious interrupt detected: last irq: 27 on CPU3 >> gic0: gic0: Spurious interrupt detected: last irq: 27 on CPU0 >> Spurious interrupt detected: last irq: 27 on CPU1 >> gic0: Spurious interrupt detected: last irq: 27 on CPU1 >> gic0: Spurious interrupt detected: last irq: 27 on CPU2 >> gic0: Spurious interrupt detected: last irq: 27 on CPU3 > > 27 happened the most by far. 106 was fairly rare. I'd not noticed any other > figures. From what I saw all were "gic0". > > sh had a few signal 11's and one signal 4 as of when I had last checked: > >> pid 13900 (sh), uid 0: exited on signal 11 (core dumped) >> pid 19325 (sh), uid 0: exited on signal 11 (core dumped) >> pid 49697 (sh), uid 0: exited on signal 11 (core dumped) >> pid 68390 (sh), uid 0: exited on signal 4 (core dumped) >> pid 81149 (sh), uid 0: exited on signal 11 (core dumped) > > I did not notice any other core dumps. > > And the following happened once that I noticed: > >> (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 16 a3 a4 80 00 00 40 00 >> (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error >> (da0:umass-sim0:0:0:0): Retrying command > > The root filesystem was on a USB SSD. > > The above was all from a ssh session history. The below is from the serial > console. . . > > Later it got the crash: > >> panic: ARM64TODO: reclaim_pv_chunk >> cpuid = 2 >> KDB: stack backtrace: >> db_trace_self() at db_trace_self_wrapper+0x28 >> pc = 0xffff00000068b430 lr = 0xffff0000000631dc >> sp = 0xffff000083758080 fp = 0xffff000083758290 >> >> db_trace_self_wrapper() at vpanic+0x170 >> pc = 0xffff0000000631dc lr = 0xffff000000335f10 >> sp = 0xffff0000837582a0 fp = 0xffff000083758320 >> >> vpanic() at panic+0x4c >> pc = 0xffff000000335f10 lr = 0xffff000000335d9c >> sp = 0xffff000083758330 fp = 0xffff0000837583b0 >> >> panic() at reclaim_pv_chunk+0x10 >> pc = 0xffff000000335d9c lr = 0xffff0000006a13d4 >> sp = 0xffff0000837583c0 fp = 0xffff0000837583c0 >> >> reclaim_pv_chunk() at get_pv_entry+0x2bc >> pc = 0xffff0000006a13d4 lr = 0xffff00000069c270 >> sp = 0xffff0000837583d0 fp = 0xffff000083758400 >> >> get_pv_entry() at pmap_enter+0x68c >> pc = 0xffff00000069c270 lr = 0xffff00000069b41c >> sp = 0xffff000083758410 fp = 0xffff0000837584b0 >> >> pmap_enter() at vm_fault_hold+0x2f0 >> pc = 0xffff00000069b41c lr = 0xffff000000641eb8 >> sp = 0xffff0000837584c0 fp = 0xffff000083758600 >> >> vm_fault_hold() at vm_fault_quick_hold_pages+0x120 >> pc = 0xffff000000641eb8 lr = 0xffff000000645004 >> sp = 0xffff000083758610 fp = 0xffff000083758670 >> >> vm_fault_quick_hold_pages() at vn_io_fault1+0x250 >> pc = 0xffff000000645004 lr = 0xffff00000042b788 >> sp = 0xffff000083758680 fp = 0xffff0000837587c0 >> >> vn_io_fault1() at vn_io_fault+0x170 >> pc = 0xffff00000042b788 lr = 0xffff0000004297a4 >> sp = 0xffff0000837587d0 fp = 0xffff000083758840 >> >> vn_io_fault() at dofilewrite+0xbc >> pc = 0xffff0000004297a4 lr = 0xffff0000003a35e4 >> sp = 0xffff000083758850 fp = 0xffff000083758890 >> >> dofilewrite() at kern_writev+0x6c >> pc = 0xffff0000003a35e4 lr = 0xffff0000003a32d4 >> sp = 0xffff0000837588a0 fp = 0xffff0000837588e0 >> >> kern_writev() at sys_write+0x7c >> pc = 0xffff0000003a32d4 lr = 0xffff0000003a3258 >> sp = 0xffff0000837588f0 fp = 0xffff000083758930 >> >> sys_write() at do_el0_sync+0x6fc >> pc = 0xffff0000003a3258 lr = 0xffff0000006a2778 >> sp = 0xffff000083758940 fp = 0xffff000083758a70 >> >> do_el0_sync() at handle_el0_sync+0x64 >> pc = 0xffff0000006a2778 lr = 0xffff00000068d1d0 >> sp = 0xffff000083758a80 fp = 0xffff000083758b90 >> >> handle_el0_sync() at 0x696ff0 >> pc = 0xffff00000068d1d0 lr = 0x0000000000696ff0 >> sp = 0xffff000083758ba0 fp = 0x0000ffffffffc610 >> >> KDB: enter: panic >> [ thread pid 850 tid 100149 ] >> Stopped at kdb_enter+0x40: undefined d4200000 > > > Side notes: > > To do the lang/gcc6 test I adjusted lang/gcc6/Makefile to (hopefully) allow > an aarch64 build attempt: > >> # svnlite diff /usr/ports/lang/gcc6 >> Index: /usr/ports/lang/gcc6/Makefile >> =================================================================== >> --- /usr/ports/lang/gcc6/Makefile (revision 424540) >> +++ /usr/ports/lang/gcc6/Makefile (working copy) >> _at__at_ -35,7 +35,7 _at__at_ >> DISTVERSION= ${PORTVERSION:C/([0-9]+).*\.s([0-9]+)/\1-\2/} >> GCC_VERSION= ${PORTVERSION:C/(.+)\.s[0-9]{8}/\1/} >> SUFFIX= ${PORTVERSION:C/([0-9]+).*/\1/} >> -ONLY_FOR_ARCHS= amd64 i386 powerpc powerpc64 sparc64 arm armv6 armv6hf >> +ONLY_FOR_ARCHS= amd64 i386 powerpc powerpc64 sparc64 arm armv6 armv6hf aarch64 >> USES= compiler cpe gmake iconv libtool makeinfo perl5 tar:bzip2 >> USE_BINUTILS= yes >> USE_PERL5= build >> _at__at_ -73,7 +73,7 _at__at_ >> USE_GCC= 4.9+ >> .endif >> >> -.if ${ARCH} == "armv6" || ${ARCH} == "armv6hf" >> +.if ${ARCH} == "armv6" || ${ARCH} == "armv6hf" || ${ARCH} == "aarch64" >> . if ${COMPILER_TYPE} == clang >> . if empty(PORT_OPTIONS:MBOOTSTRAP) >> MAKE_ARGS+=CXXFLAGS=-fbracket-depth=512 > > It did not make it to lang/gcc6 itself but did build several things. > > (I did not care if lang/gcc6 really worked or not: Environment stability > test.) > > === > Mark Millard > markmi at dsl-only.net There was fsck activity after cutting and applying power but luckily the SSD content (largely) survived. The compressed message log shows the following as an example of when the "gic0" Spurious interrupts start: > Nov 7 18:54:28 pine64 kernel: ugen3.2: <OWC Envoy Pro mini> at usbus3 > Nov 7 18:54:28 pine64 kernel: umass0 on uhub3 > Nov 7 18:54:28 pine64 kernel: umass0: <OWC Envoy Pro mini, class 0/0, rev 2.10/1.00, addr 2> on usbus3 > Nov 7 18:54:28 pine64 kernel: umass0: SCSI over Bulk-Only; quirks = 0x0100 > Nov 7 18:54:28 pine64 kernel: umass0:0:0: Attached to scbus0 > Nov 7 18:54:28 pine64 kernel: mountroot: waiting for device /dev/ufs/PINE642Grootfs... > Nov 7 18:54:28 pine64 kernel: da0 at umass-sim0 bus 0 scbus0 target 0 lun 0 > Nov 7 18:54:28 pine64 kernel: da0: <OWC Envoy Pro mini 0> Fixed Direct Access SPC-4 SCSI device > Nov 7 18:54:28 pine64 kernel: da0: Serial Number <REMOVED> > Nov 7 18:54:28 pine64 kernel: da0: 40.000MB/s transfers > Nov 7 18:54:28 pine64 kernel: da0: 228936MB (468862128 512 byte sectors) > Nov 7 18:54:28 pine64 kernel: da0: quirks=0x2<NO_6_BYTE> > Nov 7 18:54:28 pine64 kernel: awg0: link state changed to DOWN > Nov 7 18:54:28 pine64 kernel: gic0: Spurious interrupt detected: last irq: 27 on CPU2 > Nov 7 18:54:28 pine64 kernel: awg0: link state changed to UP > Nov 7 18:54:28 pine64 kernel: gic0: Spurious interrupt detected: last irq: 27 on CPU3 > Nov 7 18:54:31 pine64 kernel: . > Nov 7 18:54:31 pine64 kernel: gic0: Spurious interrupt detected: last irq: 27 on CPU2 > Nov 7 18:54:31 pine64 ntpd[602]: ntpd 4.2.8p8-a (1): Starting > Nov 7 18:54:31 pine64 ntpd[603]: leapsecond file ('/var/db/ntpd.leap-seconds.list'): good hash signature > Nov 7 18:54:31 pine64 ntpd[603]: leapsecond file ('/var/db/ntpd.leap-seconds.list'): loaded, expire=2017-06-01T00:00:00Z last=2017-01-01T00:00:00Z ofs=37 > Nov 7 19:36:17 pine64 kernel: gic0: gic0: Spurious interrupt detected: last irq: 27 on CPU1 > Nov 7 19:36:17 pine64 kernel: Spurious interrupt detected: last irq: 106 on CPU3 > Nov 7 19:36:17 pine64 kernel: gic0: Spurious interrupt detected: last irq: 27 on CPU1 > Nov 7 19:36:17 pine64 kernel: gic0: Spurious interrupt detected: last irq: 27 on CPU2 > Nov 7 19:36:17 pine64 kernel: gic0: Spurious interrupt detected: last irq: 27 on CPU0 > Nov 7 19:36:17 pine64 last message repeated 2 times > Nov 7 19:36:17 pine64 kernel: gic0: gic0: Spurious interrupt detected: last irq: 27 on CPU0 > Nov 7 19:36:17 pine64 kernel: Spurious interrupt detected: last irq: 27 on CPU3 Always shortly after awg0's link state changed to down after the da0 information. In looking at the logs I've seen a 32 and some 114's and 92's as well as the 27's and 106's. There were a few brief periods over which non-27's were fairly frequent. There are a few of the CAM status message blocks, about 7 over everything recorded, including a few prior boots of the Pine64. An interesting point is that somewhat before and while the lang/gcc6 related build activity was active the "gic0" notices stopped: > Nov 8 02:44:40 pine64 kernel: gic0: Spurious interrupt detected: last irq: 27 on CPU3 > Nov 8 03:01:15 pine64 kernel: pid 13900 (sh), uid 0: exited on signal 11 (core dumped) > Nov 8 03:01:39 pine64 kernel: pid 19325 (sh), uid 0: exited on signal 11 (core dumped) > Nov 8 03:02:02 pine64 pkg-static: indexinfo-0.2.5 installed > Nov 8 03:02:04 pine64 pkg-static: gettext-runtime-0.19.8.1 installed > Nov 8 03:04:42 pine64 kernel: pid 49697 (sh), uid 0: exited on signal 11 (core dumped) > Nov 8 03:05:26 pine64 kernel: pid 68390 (sh), uid 0: exited on signal 4 (core dumped) > Nov 8 03:06:10 pine64 kernel: pid 81149 (sh), uid 0: exited on signal 11 (core dumped) > Nov 8 03:08:36 pine64 pkg-static: gettext-tools-0.19.8.1 installed > Nov 8 03:09:07 pine64 pkg-static: gmake-4.2.1_1 installed > Nov 8 03:27:01 pine64 pkg-static: perl5-5.24.1.r4 installed > Nov 8 03:27:33 pine64 pkg-static: p5-Locale-gettext-1.06 installed > Nov 8 03:27:42 pine64 pkg-static: help2man-1.43.3_1 installed > Nov 8 03:30:22 pine64 pkg-static: texinfo-6.1.20160425,1 installed > Nov 8 03:31:59 pine64 pkg-static: m4-1.4.17_1,1 installed > Nov 8 03:34:04 pine64 pkg-static: bison-2.7.1,1 installed > Nov 8 03:37:44 pine64 pkg-static: gmp-5.1.3_3 installed > Nov 8 03:39:14 pine64 pkg-static: mpfr-3.1.5 installed > Nov 8 03:44:53 pine64 kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 16 bd 19 c0 00 00 80 00 > Nov 8 03:44:53 pine64 kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error > Nov 8 03:44:53 pine64 kernel: (da0:umass-sim0:0:0:0): Retrying command > Nov 8 03:44:55 pine64 kernel: gic0: Spurious interrupt detected: last irq: 106 on CPU3 > Nov 8 03:44:55 pine64 kernel: gic0: Spurious interrupt detected: last irq: 106 on CPU3 Somewhat after the mpfr installation above the "gic0" notices return. devel/binutils started to build but did not complete. === Mark Millard markmi at dsl-only.netReceived on Tue Nov 08 2016 - 05:42:11 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:08 UTC