On 2017-Feb-20, at 3:35 PM, Mateusz Guzik <mjguzik at gmail.com> wrote: > On Mon, Feb 20, 2017 at 03:10:44PM -0800, Mark Millard wrote: >> On 2017-Feb-20, at 2:58 PM, Mark Millard <markmi at dsl-only.net> wrote: >> >>> On 2017-Feb-20, at 11:10 AM, Mateusz Guzik <mjguzik at gmail.com> wrote: >>> >>>> On Sat, Feb 18, 2017 at 04:18:05AM -0800, Mark Millard wrote: >>>>> [Note: I experiment with clang based powerpc64 builds, >>>>> reporting problems that I find. Justin is familiar >>>>> with this, as is Nathan.] >>>>> >>>>> I tried to update the PowerMac G5 (a so-called "Quad Core") >>>>> that I have access to from head -r312761 to -r313864 and >>>>> ended up with random panics and hang ups in fairly short >>>>> order after booting. >>>>> >>>>> Some approximate bisecting for the kernel lead to: >>>>> (sometimes getting part way into a buildkernel attempt >>>>> for a different version before a failure happens) >>>>> >>>>> -r313266: works (just before use of atomic_fcmpset) >>>>> vs. >>>>> -r313271: fails (last of the "use atomic_fcmpset" check-ins) >>>>> >>>>> (I did not try -r313268 through -r313270 as the use was >>>>> gradually added.) >>>>> >>>>> So I'm currently running a -r313864 world with a -r313266 >>>>> kernel. >>>>> >>>>> No kernel that I tried that was from before -r313266 had the >>>>> problems. >>>>> >>>>> Any kernel that I tried that was from after -r313271 had the >>>>> problems. >>>>> >>>>> Of course I did not try them all in other direction. :) >>>>> >>>> >>>> I found that spin mutexes were not properly handling this, fixed in >>>> r313996. >>>> >>>> Locally I added a if (cpu_tick() % 2) return (0); snipped to amd64 >>>> fcmpset to simulate failures. Everything works, while it would easily >>>> fail without the patch. >>>> >>>> That said, I hope this concludes the 'missing check for not-reread value >>>> of failed fcmpset' saga. >>>> >>>> -- >>>> Mateusz Guzik <mjguzik gmail.com> >>> >>> I tried to update from -r313864 to -r313999 in my amd64 context >>> (a VirtualBox machine under macOS) but it now crashes late in >>> the boot sequence (after it processes a dump if I make one but >>> before I can log in). >>> >>> This update was via my usual explicit svnlite update; buildworld >>> buildkernel; etc. production style build of world and kernel, >>> including use of MALLOC_PRODUCTION. >>> >>> The window shows: >>> >>> _vm_map_lock+0xf >>> vm_map_wire+0x32 >>> rtROMemObjNativeLockInMap+0x8c >>> rtROMemObjNativeLockUser+0x51 >>> RTR0MemObjLockUserTag+0x231 >>> vbglR0HGCMInternalPreprocessCall+0x65d >>> vbglR0HGCMInternalCall+0x17c >>> vgdrvIoCtl_HGCMCall+0x43f >>> VGDrvCommonIoCtl+0x261 >>> vgdrvFreeBSDIOCtl+0x2cd >>> devfs_ioctl+0xae >>> VOP_IOCTL_APV+0x88 >>> vn_ioctl+0x161 >>> devfs_ioctl_f+0x1f >>> kern_ioctl+0x280 >>> sys_ioctl+0x13f >>> amd64_syscall+0x397 >>> Xfast_syscall+0xfb >> >> More detail from booting with the -r313864 kernel.old >> and using kgdb on what the dump produced: >> >> # kgdb kernel.debug /var/crash/vmcore. >> /var/crash/vmcore.0 /var/crash/vmcore.last >> # kgdb kernel.debug /var/crash/vmcore.0 >> GNU gdb 6.1.1 [FreeBSD] >> Copyright 2004 Free Software Foundation, Inc. >> GDB is free software, covered by the GNU General Public License, and you are >> welcome to change it and/or distribute copies of it under certain conditions. >> Type "show copying" to see the conditions. >> There is absolutely no warranty for GDB. Type "show warranty" for details. >> This GDB was configured as "amd64-marcel-freebsd"... >> >> Unread portion of the kernel message buffer: >> <118>Starting vboxservice. >> <118>VBoxService 5.1.14 r112924 (verbosity: 0) freebsd.amd64 (Jan 20 2017 18:37:45) release log >> <118>00:00:00.000120 main Log opened 2017-02-20T22:38:46.348080000Z >> <118>00:00:00.000162 main OS Product: FreeBSD >> <118>00:00:00.000171 main OS Release: 12.0-CURRENT >> <118>00:00:00.000180 main OS Version: FreeBSD 12.0-CURRENT r313999M >> <118>00:00:00.000192 main Executable: /usr/local/sbin/VBoxService >> <118>00:00:00.000194 main Process ID: 609 >> <118>00:00:00.000196 main Package type: BSD_64BITS_GENERIC (OSE) >> >> >> Fatal trap 12: page fault while in kernel mode >> cpuid = 2; apic id = 02 >> fault virtual address = 0xd6 >> fault code = supervisor read data, page not present >> instruction pointer = 0x20:0xffffffff80d4ebaf >> stack pointer = 0x28:0xfffffe0122e2bef0 >> frame pointer = 0x28:0xfffffe0122e2bf00 >> code segment = base 0x0, limit 0xfffff, type 0x1b >> = DPL 0, pres 1, long 1, def32 0, gran 1 >> processor eflags = interrupt enabled, resume, IOPL = 0 >> current process = 609 (VBoxService) >> > > > >> #9 0xffffffff80eb6be1 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:236 >> #10 0xffffffff80d4ebaf in _vm_map_lock (map=0x1, file=0x0, line=0) at /usr/src/sys/vm/vm_map.c:501 > > The function is: > void > _vm_map_lock(vm_map_t map, const char *file, int line) > { > > if (map->system_map) > mtx_lock_flags_(&map->system_mtx, 0, file, line); > else > sx_xlock_(&map->lock, file, line); > map->timestamp++; > } > > system_map is at offset 0xd5, thus the faulting address of 0xd6 with map > address of 1 looks like the backtrace is corect. But this suggests the > bug is unrelated to my changes and there is a chance there is no bug in > the first place. > > Please make sure that the virtualbox module is recompiled against proper > source tree. If the problem persists, please bisect. The range is not > big. > > Off hand I don't see what can cause the failure in question (and chances > are there is no bug if kbi changed and the module was not recompiled). > >> #11 0xffffffff80d51ea2 in vm_map_wire (map=<value optimized out>, start=4534272, end=4538368, flags=1) at /usr/src/sys/vm/vm_map.c:2534 >> #12 0xffffffff8265291c in rtR0MemObjNativeLockInMap () from /boot/modules/vboxguest.ko >> #13 0xffffffff82652881 in rtR0MemObjNativeLockUser () from /boot/modules/vboxguest.ko >> #14 0xffffffff8264ec01 in RTR0MemObjLockUserTag () from /boot/modules/vboxguest.ko >> #15 0xffffffff82624afd in vbglR0HGCMInternalPreprocessCall () from /boot/modules/vboxguest.ko >> #16 0xffffffff8262411a in VbglR0HGCMInternalCall () from /boot/modules/vboxguest.ko >> #17 0xffffffff8261ec4f in vgdrvIoCtl_HGCMCall () from /boot/modules/vboxguest.ko >> #18 0xffffffff8261d221 in VGDrvCommonIoCtl () from /boot/modules/vboxguest.ko I do not expect that the kernel binary interface deliberately changed between -r313864 and -r313999. Until the attempted update of amd64 (which I always do first) the amd64 and arm64 were running: . . . 12.0-CURRENT FreeBSD 12.0-CURRENT r313864M . . . 1200021 1200021 I've not noticed an update to 1200022 yet. [It turned out that for powerpc64 I had to use -r313266 for the kernel when I tried to update to -r313864. This does mix 1200020 and 1200021. But 1200021 was removal of support for things I do not have involved --and the combination has seemed okay so far.] I've decided to do a round of port upgrades (to -r434493), although virtualbox client has not been updated. I'll force a rebuild before I'm done. It turns out that llvm39 is now required for what I choose to have and its build ran out of RAM/swap as I had things configured. So I've adjusted to have the VM have more RAM assigned and I'm not starting lumina but just using the console for now. We will see. Note: I always manually start lumina and so it was not involved in the boot problem: it was just a basic console style context at all times for the boot crash. Overall it will be a while before I have a works vs. fails pair that are significantly closer together. -- Mateusz Guzik <mjguzik gmail.com>Received on Tue Feb 21 2017 - 01:36:20 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:10 UTC