Re: radeon_cp_texture: page fault with non-sleepable locks held

From: Kostik Belousov <kostikbel_at_gmail.com> Date: Mon, 8 Nov 2010 14:04:03 +0200 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:08 UTC

On Mon, Nov 08, 2010 at 01:50:25PM +0200, Andriy Gapon wrote:
> on 05/11/2010 09:27 Andriy Gapon said the following:
> > 
> > I use FreeSBD head and KDE 4 with all the bells and whistles enabled.
> > Apparently recent KDE update has enabled even more of them, because I started to
> > have panics with a kernel that has INVARIANTS and WITNESS enabled.
> 
> I tried to solve the problem by changing drmdev from mutex to sx:
> http://people.freebsd.org/~avg/drm-sx.diff
I remember that drm lock can be acquired from the interrupt thread, if
the card supports interrupts. Changing it to sx cannot work then, because
interrupt threads cannot sleep. Most likely, you are getting around it
since r600 not yet used interrupts on FreeBSD.

I think the solution is to drop drm lock around copyin.
> 
> The things have improved, I am not getting the panic anymore.
> Instead I have this LOR now:
> lock order reversal:
> 1st 0xffffff0001b968a0 drmdev (drmdev) _at_ /usr/src/sys/dev/drm/drm_drv.c:791
> 2nd 0xffffff0072a87200 user map (user map) _at_ /usr/src/sys/vm/vm_map.c:3548
> KDB: stack backtrace:
> db_trace_self_wrapper() at 0xffffffff801b8b3a = db_trace_self_wrapper+0x2a
> kdb_backtrace() at 0xffffffff803a7a6a = kdb_backtrace+0x3a
> _witness_debugger() at 0xffffffff803bd40c = _witness_debugger+0x2c
> witness_checkorder() at 0xffffffff803be879 = witness_checkorder+0x959
> _sx_slock() at 0xffffffff80378af8 = _sx_slock+0x88
> _vm_map_lock_read() at 0xffffffff805109e6 = _vm_map_lock_read+0x36
> vm_map_lookup() at 0xffffffff805127b4 = vm_map_lookup+0x54
> vm_fault() at 0xffffffff805097f9 = vm_fault+0xf9
> trap_pfault() at 0xffffffff80545d0f = trap_pfault+0x11f
> trap() at 0xffffffff80546597 = trap+0x657
> calltrap() at 0xffffffff805305c8 = calltrap+0x8
> --- trap 0xc, rip = 0xffffffff8054405d, rsp = 0xffffff81241b47f0, rbp =
> 0xffffff81241b4870 ---
> copyin() at 0xffffffff8054405d = copyin+0x3d
> radeon_cp_texture() at 0xffffffff8022fbd7 = radeon_cp_texture+0x167
> drm_ioctl() at 0xffffffff8020fa38 = drm_ioctl+0x318
> devfs_ioctl_f() at 0xffffffff802dd649 = devfs_ioctl_f+0x109
> kern_ioctl() at 0xffffffff803c1107 = kern_ioctl+0x1f7
> ioctl() at 0xffffffff803c12c8 = ioctl+0x168
> syscallenter() at 0xffffffff803b57be = syscallenter+0x26e
> syscall() at 0xffffffff80545e52 = syscall+0x42
> Xfast_syscall() at 0xffffffff805308a2 = Xfast_syscall+0xe2
> 
> Is this a serious LOR?
I think it is. The d_mmap() cdevsw method acquires drm lock.

> How can I resolve it?
See above.

> 
> > The panic:
> > Kernel page fault with the following non-sleepable locks held:
> > exclusive sleep mutex drmdev (drmdev) r = 0 (0xffffff0001b968a0) locked _at_
> > /usr/src/sys/dev/drm/drm_drv.c:791
> > KDB: stack backtrace:
> > db_trace_self_wrapper() at 0xffffffff801b8afa = db_trace_self_wrapper+0x2a
> > kdb_backtrace() at 0xffffffff803a7afa = kdb_backtrace+0x3a
> > _witness_debugger() at 0xffffffff803bd49c = _witness_debugger+0x2c
> > witness_warn() at 0xffffffff803bed32 = witness_warn+0x322
> > trap() at 0xffffffff8054639f = trap+0x39f
> > calltrap() at 0xffffffff80530688 = calltrap+0x8
> > --- trap 0xc, rip = 0xffffffff8054411d, rsp = 0xffffff81241917f0, rbp =
> > 0xffffff8124191870 ---
> > copyin() at 0xffffffff8054411d = copyin+0x3d
> > radeon_cp_texture() at 0xffffffff8022fcc7 = radeon_cp_texture+0x167
> > drm_ioctl() at 0xffffffff8020fa78 = drm_ioctl+0x318
> > devfs_ioctl_f() at 0xffffffff802dd739 = devfs_ioctl_f+0x109
> > kern_ioctl() at 0xffffffff803c1197 = kern_ioctl+0x1f7
> > ioctl() at 0xffffffff803c1358 = ioctl+0x168
> > syscallenter() at 0xffffffff803b584e = syscallenter+0x26e
> > syscall() at 0xffffffff80545f12 = syscall+0x42
> > Xfast_syscall() at 0xffffffff80530962 = Xfast_syscall+0xe2
> > --- syscall (54, FreeBSD ELF64, ioctl), rip = 0x801f96a1c, rsp = 0x7fffffffe7a8,
> > rbp = 0xc020644e ---
> > 
> > 
> > Fatal trap 12: page fault while in kernel mode
> > cpuid = 0; apic id = 00
> > fault virtual address   = 0x832372000
> > fault code              = supervisor read data, page not present
> > instruction pointer     = 0x20:0xffffffff8054411d
> > stack pointer           = 0x28:0xffffff81241917f0
> > frame pointer           = 0x28:0xffffff8124191870
> > code segment            = base 0x0, limit 0xfffff, type 0x1b
> >                         = DPL 0, pres 1, long 1, def32 0, gran 1
> > processor eflags        = interrupt enabled, resume, IOPL = 3
> > current process         = 3439 (initial thread)
> > trap number             = 12
> > panic: page fault
> > cpuid = 0
> > 
> > 
> > The panic is quite obvious: drmdev mutex is taken and held in drm_ioctl() and
> > radeon_cp_texture() can perform copyin and/or copyout, so it's a matter of a
> > chance (or proper workload) to hit a page fault there.
> > 
> > What's not obvious is how to properly fix this.
> > Any ideas?
> > 
> > Probably less important is what started to trigger the problem.  Because the
> > code hasn't been changed in ages and I have never seen this issue before.
> > But, d'oh, it seems that this issue has been already reported:
> > http://www.mail-archive.com/freebsd-hackers_at_freebsd.org/msg67757.html
> > 
> > I will appreciate any help.
> > Thanks!
> 
> 
> -- 
> Andriy Gapon