Re: unkillable process consuming 100% cpu

From: Steve Kargl <sgk_at_troutmask.apl.washington.edu>
Date: Wed, 13 Nov 2019 06:52:04 -0800
On Wed, Nov 13, 2019 at 09:10:06AM +0100, Hans Petter Selasky wrote:
> On 2019-11-13 01:30, Steve Kargl wrote:
> > 
> > I installed the 2nd seqlock.diff, rebuilt drm-current-kmod-4.16.g20191023,
> > rebooting, and have been pounding on the system with workloads that are
> > similar to what the system was doing during the lockups.  So far, I
> > cannot ge the system lock-up.  Looks like your patch fixes (or at
> > least helps).  Thanks for taking a look at the problem.
> > 
> 
> Can you apply the kdb.diff on top and check dmesg for prints?
> 

I could not find the amdgpu_amdkfd_gpuvm.c file when I went looking.
Is it autogenerated?

I also spoke too soon. I got a panic after my reply above.

Fatal trap 12: page fault while in kernel mode
cpuid = 5; apic id = 15
fault virtual address   = 0x0
fault code              = supervisor read instruction, page not present
instruction pointer     = 0x20:0x0
stack pointer           = 0x28:0xfffffe00b460e188
frame pointer           = 0x28:0xfffffe00b460e1c0
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 877 (X:rcs0)
trap number             = 12
panic: page fault
cpuid = 5

db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00b460dde0
vpanic() at vpanic+0x17e/frame 0xfffffe00b460de40
panic() at panic+0x43/frame 0xfffffe00b460dea0
trap_fatal() at trap_fatal+0x388/frame 0xfffffe00b460df10
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00b460df80
trap() at trap+0x288/frame 0xfffffe00b460e0b0
calltrap() at calltrap+0x8/frame 0xfffffe00b460e0b0
--- trap 0xc, rip = 0, rsp = 0xfffffe00b460e188, rbp = 0xfffffe00b460e1c0 ---
??() at 0/frame 0xfffffe00b460e1c0
radeon_cs_ioctl() at radeon_cs_ioctl+0xa0b/frame 0xfffffe00b460e640
drm_ioctl_kernel() at drm_ioctl_kernel+0xf1/frame 0xfffffe00b460e680
drm_ioctl() at drm_ioctl+0x279/frame 0xfffffe00b460e770
linux_file_ioctl() at linux_file_ioctl+0x298/frame 0xfffffe00b460e7d0
kern_ioctl() at kern_ioctl+0x284/frame 0xfffffe00b460e840
sys_ioctl() at sys_ioctl+0x157/frame 0xfffffe00b460e910
amd64_syscall() at amd64_syscall+0x273/frame 0xfffffe00b460ea30
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe00b460ea30
--- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x200cc6bfa, rsp = 0x7fffbfffde98, rbp = 0x7fffbfffdec0 ---
Uptime: 5h9m5s
Dumping 1472 out of 16327 MB:..2%..11%..21%..31%..41%..52%..61%..71%..81%..91%

__curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
warning: Source file is more recent than executable.
55              __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,
(kgdb) #0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
#1  doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:392
#2  0xffffffff805de452 in kern_reboot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:479
#3  0xffffffff805de8a6 in vpanic (fmt=<optimized out>, ap=<optimized out>)
    at /usr/src/sys/kern/kern_shutdown.c:908
#4  0xffffffff805de6c3 in panic (fmt=<unavailable>)
    at /usr/src/sys/kern/kern_shutdown.c:835
#5  0xffffffff808b0d58 in trap_fatal (frame=0xfffffe00b460e0c0, eva=0)
    at /usr/src/sys/amd64/amd64/trap.c:925
#6  0xffffffff808b0daf in trap_pfault (frame=0xfffffe00b460e0c0, 
    usermode=<optimized out>, signo=<optimized out>, ucode=<optimized out>)
    at /usr/src/sys/amd64/amd64/trap.c:743
#7  0xffffffff808b0468 in trap (frame=0xfffffe00b460e0c0)
    at /usr/src/sys/amd64/amd64/trap.c:407
#8  <signal handler called>
#9  0x0000000000000000 in ?? ()
#10 0xffffffff817d2c0f in radeon_ttm_tt_to_gtt (ttm=0xfffff80061eeb248)
    at /usr/local/sys/modules/drm-current-kmod/drivers/gpu/drm/radeon/radeon_ttm.c:720
#11 radeon_ttm_tt_set_userptr (ttm=0xfffff80061eeb248, addr=1, 
    flags=2147483647)
    at /usr/local/sys/modules/drm-current-kmod/drivers/gpu/drm/radeon/radeon_ttm.c:804
#12 0xffffffff817adc9b in radeon_is_px (dev=0xfffff8017fe84e00)
    at /usr/local/sys/modules/drm-current-kmod/drivers/gpu/drm/radeon/radeon_device.c:156
#13 0xffffffff818a9e81 in drm_ioctl_kernel (linux_file=<optimized out>, 
    func=0xfffffe00b460e428, kdata=0xfffffe00b31eb000, flags=1521620552)
    at /usr/local/sys/modules/drm-current-kmod/drivers/gpu/drm/drm_ioctl.c:760
#14 0xffffffff818aa129 in drm_ioctl (filp=0xfffff80061198e00, 
    cmd=<optimized out>, arg=65536)
    at /usr/local/sys/modules/drm-current-kmod/drivers/gpu/drm/drm_ioctl.c:856
#15 0xffffffff807c8098 in linux_file_ioctl_sub (fp=<optimized out>, 
    filp=<optimized out>, fop=<optimized out>, cmd=<optimized out>, 
    data=<optimized out>, td=<optimized out>)
    at /usr/src/sys/compat/linuxkpi/common/src/linux_compat.c:965
#16 linux_file_ioctl (fp=<optimized out>, cmd=<optimized out>, 
    data=<optimized out>, cred=<optimized out>, td=0xfffff800612c0000)
    at /usr/src/sys/compat/linuxkpi/common/src/linux_compat.c:1558
#17 0xffffffff8063ed34 in fo_ioctl (fp=<optimized out>, com=3223348326, 
    data=0x7fffffff, active_cred=0xfffffe001f7e6250, td=0xfffff800612c0000)
    at /usr/src/sys/sys/file.h:340
#18 kern_ioctl (td=<optimized out>, fd=9, com=3223348326, 
    data=0x7fffffff <error: Cannot access memory at address 0x7fffffff>)
    at /usr/src/sys/kern/sys_generic.c:801
#19 0xffffffff8063ea37 in sys_ioctl (td=0xfffff800612c0000, 
    uap=0xfffff800612c03c8) at /usr/src/sys/kern/sys_generic.c:709
#20 0xffffffff808b1783 in syscallenter (td=0xfffff800612c0000)
    at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:144
#21 amd64_syscall (td=0xfffff800612c0000, traced=0)
    at /usr/src/sys/amd64/amd64/trap.c:1162
#22 <signal handler called>



-- 
Steve
Received on Wed Nov 13 2019 - 13:52:14 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:22 UTC