[Even the artifacts debug-style kernel gets the problem. More evidence of context extracted and supplied.] On 2020-Jan-8, at 18:09, Mark Millard <marklmi at yahoo.com> wrote: > [I got kernel backtrace information, included.] > > On 2020-Jan-8, at 15:12, Mark Millard <marklmi at yahoo.com> wrote: > >> I've attempted a head -356426 based kyua run on an >> old 2-socket PowerMac G4. The buildworld and >> buildkernel were non-debug. The context has personal >> patches, mostly for dealing with PowerMac issues. >> >> It has had over 180 CPU minutes running for: >> >> sys/vm/mlock_test:mlock__copy_on_write_vnode -> >> >> Normal seems to be way under 0.1 sec on the >> other platforms I've made runs on recently. >> >> Hopefully kyua will time out and continue the >> testing at some point. >> >> The 2 socket (2 cores each) G5 powerpc64 context >> did not have this problem. Nor did the armv7 or >> aarch64 examples (CortexA7, CortexA53, CortexA57, >> and CortexA72). > > > I finally gave up on it after 240 CPU minutes but > could not kill/stop the stuck process. So I > sync'd the file system and tried a "shutdown -r > now" and forced the power off after it got stuck > (no reboot happened). > > After power-up I tried: > > # kyua test -k /usr/tests/Kyuafile sys/vm/mlock_test > sys/vm/mlock_test:mlock__copy_on_write_anon -> passed [0.017s] > sys/vm/mlock_test:mlock__copy_on_write_vnode -> > > and it got stuck again. > > I'll note that ps -auxd shows the likes of: > > root 1120 0.0 0.4 11512 8772 0 I+ 16:38 0:00.62 | | `-- kyua test -k /usr/tests/Kyuafile sys/vm/mlock_test > root 1124 100.0 0.1 4640 2332 - Rs 16:38 2:57.43 | | `-- /usr/tests/sys/vm/mlock_test -vunprivileged-user=tests -r/tmp/kyua.B2pXx8/3/result.atf mlock__copy_on_write_vnode > root 1125 0.0 0.0 4640 620 - TXL 16:38 0:00.00 | | `-- /usr/tests/sys/vm/mlock_test -vunprivileged-user=tests -r/tmp/kyua.B2pXx8/3/result.atf mlock__copy_on_write_vnode > > > I got a couple of backtraces from the kernel > via the ddb> prompt : > > pid 1125 was in thread_suspend_switch called via ptracestop. > > I've a couple of examples of pid 1124 (the CPU time > taker): (manually typed from screen images) > > 0xdc9e0520: at mi_switch+0x17c > 0xdc9e0540: at critical_exit_preempt+0x7c > 0xdc9e0560: at powerpc_interrupt+0x1c4 > 0xdc9e0590: at kernel EXI trap by __syncicache+0x5c: > srr1= 0x209032 > r1= 0xdc9e0650 > cr= 0x8822fc22 > xer= 0 > ctr= 0 > frame=0xdc9e0598 > 0xdc9e0650: at 0x5ed67ec > 0xdc9e0660: at moea_sync_icache+0x118 > Note: From here on down is common with the > other example backtrace: > 0xdc9e0690: at pmap_sync_icache+0x98 > 0xdc9e06b0: at vm_sync_icache+0x2c > 0xdc9e06c0: at proc_rwmem+0x13c > 0xdc9e0710: at kern_ptrace+0x76c > 0xdc9e0830: at sys_ptrace+0x12c > 0xdc9e0960: at trap+0xae8 > 0xdc9e0a10: at powerpc_interrupt+0x1ec > 0xdc9e0a40: at use SC trap by 0x4191ea48: > srr1= 0x209032 > r1= 0xffffc6d0 > cr= 0x28000200 > xer= 0 > ctr= 0x4191ea40 > frame=0xdc9e0a48 > > The non-common part of the other backtrace > is: > > 0xdc9e04a0: at intr_event_handle+0xd4 > 0xdc9e04e0: at powerpc_dispatch_intr+0xe0 > 0xdc9e0520: at openpic_dispatch+0x90 > 0xdc9e0540: at powerpc_interrupt+0x128 > 0xdc9e0570: at kernel EXI trap by moea_pvo_find_va: > srr1= 0xf032 > r1= 0xdc9e0630 > cr= 0x4822fc22 > xer= 0 > ctr= 0 > frame=0xdc9e0578 > 0xdc9e0630: at 0x41b76ffc > 0xdc9e0660: at moea_sync_icache+0x100 > > Showing a lock chain showed just one line: > > thread 100151 (pid 1124, mlock_test) running on CPU 0 > > The pcpu output for cpuid 0 metioned: critnest 2 "mlock_test" > when I tried it. > > (After that I did something that locked up the machine, > probably my fault.) > > > It does not look like I can complete a normal kyua > run for a 2-socket 32-bit powerpc. May be someone > else can for some multi-socket 32-bit powerpc to > see if this repeats. > > Single-socket/single-core might prove interesting > as well. Maybe I can try such. It fails on the single-processor/single-core G4 PowerMac that I tested as well. I then installed the kernel*.txz from: https://artifact.ci.freebsd.org/snapshot/head/r356426/powerpc/powerpc/ and rebooted with it via boot -v (to avoid the memory modified-after-free for debug kernels). (I'd forgotten about debug stopping and reporting such.) It still fails: # kyua test -k /usr/tests/Kyuafile sys/vm/mlock_test sys/vm/mlock_test:mlock__copy_on_write_anon -> passed [0.029s] sys/vm/mlock_test:mlock__copy_on_write_vnode -> The debug kernel does not report anything for it. I gather that the test's internal description is: /* * Use ptrace(2) to trigger a copy-on-write fault of a read-only text page. */ ATF_TC_WITHOUT_HEAD(mlock__copy_on_write_vnode); ATF_TC_BODY(mlock__copy_on_write_vnode, tc) { void *addr; int len; len = getpagesize(); addr = (void *)((uintptr_t)test_wired_copy_on_write & ~(len - 1)); test_wired_copy_on_write(addr, len); } where test_wired_copy_on_write is: static void test_wired_copy_on_write(void *addr, size_t len) { int status, val; pid_t pid; pid = fork(); if (pid == -1) atf_tc_fail("fork() failed: %s", strerror(errno)); if (pid == 0) { if (mlock(addr, len) != 0) _exit(1); if (ptrace(PT_TRACE_ME, 0, NULL, 0) != 0) _exit(2); if (raise(SIGSTOP) != 0) _exit(3); if (munlock(addr, len) != 0) _exit(4); _exit(0); } ATF_REQUIRE(waitpid(pid, &status, 0) == pid); ATF_REQUIRE_MSG(!WIFEXITED(status), "child exited with status %d", WEXITSTATUS(status)); ATF_REQUIRE(WIFSTOPPED(status)); ATF_REQUIRE(WSTOPSIG(status) == SIGSTOP); errno = 0; val = ptrace(PT_READ_D, pid, addr, 0); ATF_REQUIRE(errno == 0); ATF_REQUIRE(ptrace(PT_WRITE_D, pid, addr, val) == 0); ATF_REQUIRE(ptrace(PT_CONTINUE, pid, (caddr_t)1, 0) == 0); ATF_REQUIRE(waitpid(pid, &status, 0) == pid); ATF_REQUIRE(WIFEXITED(status)); ATF_REQUIRE_MSG(WEXITSTATUS(status) == 0, "child exited with status %d", WSTOPSIG(status)); } I give my evidence below, but the ptrace(PT_WRITE_D, . . .) looks to be the one involved in the backtraces: It looks like proc_rwmem is using the vm_sync_icache in: /* * If we are writing, then we request vm_fault() to create a private * copy of each page. Since these copies will not be writeable by the * process, we must explicity request that they be dirtied. */ writing = uio->uio_rw == UIO_WRITE; reqprot = writing ? VM_PROT_COPY | VM_PROT_READ : VM_PROT_READ; fault_flags = writing ? VM_FAULT_DIRTY : VM_FAULT_NORMAL; . . . /* Make the I-cache coherent for breakpoints. */ if (writing && error == 0) { vm_map_lock_read(map); if (vm_map_check_protection(map, pageno, pageno + PAGE_SIZE, VM_PROT_EXECUTE)) vm_sync_icache(map, uva, len); vm_map_unlock_read(map); } This appears to mean that it got there via kern_ptrace's: case PT_WRITE_I: case PT_WRITE_D: td2->td_dbgflags |= TDB_USERWR; PROC_UNLOCK(p); error = 0; if (proc_writemem(td, p, (off_t)(uintptr_t)addr, &data, sizeof(int)) != sizeof(int)) error = ENOMEM; else CTR3(KTR_PTRACE, "PT_WRITE: pid %d: %p <= %#x", p->p_pid, addr, data); PROC_LOCK(p); break; (via proc_writemem). So the test's ptrace involved in the backtraces seems to be: ATF_REQUIRE(ptrace(PT_WRITE_D, pid, addr, val) == 0); (Not that I've a clue what to do with that information at this point.) === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)Received on Thu Jan 09 2020 - 02:51:30 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:22 UTC