Re: svn commit: r313268 - head/sys/kern [through -r313271 for atomic_fcmpset use and later: fails on PowerMac G5 "Quad Core"; -r313266 works]

From: Mark Millard <markmi_at_dsl-only.net>
Date: Mon, 20 Feb 2017 15:10:44 -0800
On 2017-Feb-20, at 2:58 PM, Mark Millard <markmi_at_dsl-only.net> wrote:

> On 2017-Feb-20, at 11:10 AM, Mateusz Guzik <mjguzik at gmail.com> wrote:
> 
>> On Sat, Feb 18, 2017 at 04:18:05AM -0800, Mark Millard wrote:
>>> [Note: I experiment with clang based powerpc64 builds,
>>> reporting problems that I find. Justin is familiar
>>> with this, as is Nathan.]
>>> 
>>> I tried to update the PowerMac G5 (a so-called "Quad Core")
>>> that I have access to from head -r312761 to -r313864 and
>>> ended up with random panics and hang ups in fairly short
>>> order after booting.
>>> 
>>> Some approximate bisecting for the kernel lead to:
>>> (sometimes getting part way into a buildkernel attempt
>>> for a different version before a failure happens)
>>> 
>>> -r313266: works (just before use of atomic_fcmpset)
>>> vs.
>>> -r313271: fails (last of the "use atomic_fcmpset" check-ins)
>>> 
>>> (I did not try -r313268 through -r313270 as the use was
>>> gradually added.)
>>> 
>>> So I'm currently running a -r313864 world with a -r313266
>>> kernel.
>>> 
>>> No kernel that I tried that was from before -r313266 had the
>>> problems.
>>> 
>>> Any kernel that I tried that was from after -r313271 had the
>>> problems.
>>> 
>>> Of course I did not try them all in other direction. :)
>>> 
>> 
>> I found that spin mutexes were not properly handling this, fixed in
>> r313996.
>> 
>> Locally I added a if (cpu_tick() % 2) return (0); snipped to amd64
>> fcmpset to simulate failures. Everything works, while it would easily
>> fail without the patch.
>> 
>> That said, I hope this concludes the 'missing check for not-reread value
>> of failed fcmpset' saga.
>> 
>> -- 
>> Mateusz Guzik <mjguzik gmail.com>
> 
> I tried to update from -r313864 to -r313999 in my amd64 context
> (a VirtualBox machine under macOS) but it now crashes late in
> the boot sequence (after it processes a dump if I make one but
> before I can log in).
> 
> This update was via my usual explicit svnlite update; buildworld
> buildkernel; etc. production style build of world and kernel,
> including use of MALLOC_PRODUCTION.
> 
> The window shows:
> 
> _vm_map_lock+0xf
> vm_map_wire+0x32
> rtROMemObjNativeLockInMap+0x8c
> rtROMemObjNativeLockUser+0x51
> RTR0MemObjLockUserTag+0x231
> vbglR0HGCMInternalPreprocessCall+0x65d
> vbglR0HGCMInternalCall+0x17c
> vgdrvIoCtl_HGCMCall+0x43f
> VGDrvCommonIoCtl+0x261
> vgdrvFreeBSDIOCtl+0x2cd
> devfs_ioctl+0xae
> VOP_IOCTL_APV+0x88
> vn_ioctl+0x161
> devfs_ioctl_f+0x1f
> kern_ioctl+0x280
> sys_ioctl+0x13f
> amd64_syscall+0x397
> Xfast_syscall+0xfb

More detail from booting with the -r313864 kernel.old
and using kgdb on what the dump produced:

# kgdb kernel.debug /var/crash/vmcore.
/var/crash/vmcore.0    /var/crash/vmcore.last
# kgdb kernel.debug /var/crash/vmcore.0
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
<118>Starting vboxservice.
<118>VBoxService 5.1.14 r112924 (verbosity: 0) freebsd.amd64 (Jan 20 2017 18:37:45) release log
<118>00:00:00.000120 main     Log opened 2017-02-20T22:38:46.348080000Z
<118>00:00:00.000162 main     OS Product: FreeBSD
<118>00:00:00.000171 main     OS Release: 12.0-CURRENT
<118>00:00:00.000180 main     OS Version: FreeBSD 12.0-CURRENT  r313999M
<118>00:00:00.000192 main     Executable: /usr/local/sbin/VBoxService
<118>00:00:00.000194 main     Process ID: 609
<118>00:00:00.000196 main     Package type: BSD_64BITS_GENERIC (OSE)


Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 02
fault virtual address   = 0xd6
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80d4ebaf
stack pointer           = 0x28:0xfffffe0122e2bef0
frame pointer           = 0x28:0xfffffe0122e2bf00
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 609 (VBoxService)

Reading symbols from /boot/kernel/zfs.ko...Reading symbols from /usr/lib/debug//boot/kernel/zfs.ko.debug...done.
done.
Loaded symbols for /boot/kernel/zfs.ko
Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from /usr/lib/debug//boot/kernel/opensolaris.ko.debug...done.
done.
Loaded symbols for /boot/kernel/opensolaris.ko
Reading symbols from /boot/modules/vboxguest.ko...done.
Loaded symbols for /boot/modules/vboxguest.ko
#0  doadump (textdump=0) at pcpu.h:232
232             __asm("movq %%gs:%1,%0" : "=r" (td)
(kgdb) bt
#0  doadump (textdump=0) at pcpu.h:232
#1  0xffffffff8039dd0b in db_dump (dummy=<value optimized out>, dummy2=<value optimized out>, dummy3=<value optimized out>, dummy4=<value optimized out>) at /usr/src/sys/ddb/db_command.c:546
#2  0xffffffff8039db0f in db_command (cmd_table=<value optimized out>) at /usr/src/sys/ddb/db_command.c:453
#3  0xffffffff8039d884 in db_command_loop () at /usr/src/sys/ddb/db_command.c:506
#4  0xffffffff803a0814 in db_trap (type=<value optimized out>, code=<value optimized out>) at /usr/src/sys/ddb/db_main.c:254
#5  0xffffffff80a9c0c3 in kdb_trap (type=<value optimized out>, code=<value optimized out>, tf=<value optimized out>) at /usr/src/sys/kern/subr_kdb.c:654
#6  0xffffffff80ed25d2 in trap_fatal (frame=0xfffffe0122e2be30, eva=214) at /usr/src/sys/amd64/amd64/trap.c:796
#7  0xffffffff80ed27dc in trap_pfault (frame=0xfffffe0122e2be30, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:658
#8  0xffffffff80ed1e90 in trap (frame=0xfffffe0122e2be30) at /usr/src/sys/amd64/amd64/trap.c:421
#9  0xffffffff80eb6be1 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:236
#10 0xffffffff80d4ebaf in _vm_map_lock (map=0x1, file=0x0, line=0) at /usr/src/sys/vm/vm_map.c:501
#11 0xffffffff80d51ea2 in vm_map_wire (map=<value optimized out>, start=4534272, end=4538368, flags=1) at /usr/src/sys/vm/vm_map.c:2534
#12 0xffffffff8265291c in rtR0MemObjNativeLockInMap () from /boot/modules/vboxguest.ko
#13 0xffffffff82652881 in rtR0MemObjNativeLockUser () from /boot/modules/vboxguest.ko
#14 0xffffffff8264ec01 in RTR0MemObjLockUserTag () from /boot/modules/vboxguest.ko
#15 0xffffffff82624afd in vbglR0HGCMInternalPreprocessCall () from /boot/modules/vboxguest.ko
#16 0xffffffff8262411a in VbglR0HGCMInternalCall () from /boot/modules/vboxguest.ko
#17 0xffffffff8261ec4f in vgdrvIoCtl_HGCMCall () from /boot/modules/vboxguest.ko
#18 0xffffffff8261d221 in VGDrvCommonIoCtl () from /boot/modules/vboxguest.ko
#19 0xffffffff8262327d in vgdrvFreeBSDIOCtl () from /boot/modules/vboxguest.ko
#20 0xffffffff8092976e in devfs_ioctl (ap=<value optimized out>) at /usr/src/sys/fs/devfs/devfs_vnops.c:805
#21 0xffffffff8103ef58 in VOP_IOCTL_APV (vop=<value optimized out>, a=<value optimized out>) at vnode_if.c:1067
#22 0xffffffff80b29431 in vn_ioctl (fp=0xfffff80006d37730, com=<value optimized out>, data=0xfffffe0122e2c870, active_cred=0xfffff80006495a00, td=<value optimized out>) at vnode_if.h:448
#23 0xffffffff80929d5f in devfs_ioctl_f (fp=<value optimized out>, com=<value optimized out>, data=<value optimized out>, cred=<value optimized out>, td=0xfffff8001504e000) at /usr/src/sys/fs/devfs/devfs_vnops.c:763
#24 0xffffffff80ab8bf0 in kern_ioctl (td=<value optimized out>, fd=3, com=<value optimized out>, data=0xfffffe0122e2c870 "\031\002R\031P") at file.h:322
#25 0xffffffff80ab88bf in sys_ioctl (td=<value optimized out>, uap=0xfffffe0122e2ca30) at /usr/src/sys/kern/sys_generic.c:743
#26 0xffffffff80ed2e27 in amd64_syscall (td=0xfffff8001504e000, traced=0) at subr_syscall.c:135
#27 0xffffffff80eb6ecb in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:396
#28 0x0000000800c5317a in ?? ()
Previous frame inner to this frame (corrupt stack?)
Current language:  auto; currently minimal


===
Mark Millard
markmi at dsl-only.net
Received on Mon Feb 20 2017 - 22:10:49 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:10 UTC