Re: nvidia-driver crashing kernel on head

From: John Baldwin <jhb_at_freebsd.org>
Date: Thu, 8 Jul 2010 08:26:32 -0400
On Friday, July 02, 2010 12:55:38 pm David Naylor wrote:
> On Friday 02 July 2010 14:57:35 René Ladan wrote:
> > 2010/7/2 Yuri Pankov <yuri.pankov_at_gmail.com>:
> > > On Fri, Jul 02, 2010 at 11:46:41AM +0200, David Naylor wrote:
> > >> Hi,
> > >> 
> > >> I'm not sure this has been reported before but I am experience crashes
> > >> with nvidia-driver on -current (cvsup ~day ago).
> > >> 
> > >> If I remove all the debugging options from the kernel config then it is
> > >> very usable.
> > >> 
> > >> Here are the backtraces from two nvidia-driver versions:
> > >> 
> > >> nvidia-driver-195.36.15 and GENERIC:
> > >> panic: mutex page lock not owned at
> > >> /home/freebsd9/src/sys/vm/vm_page.c:1638 cpuid = 1
> > >> KDB: enter: panic
> > >> [ thread pid 1815 tid 100097 ]
> > >> Stopped at      kdb_enter+0x3d: movq    $0,0x6bc27c(%rip)
> > >> db> bt
> > >> Tracing pid 1815 tid 100097 td 0xffffff00045af000
> > >> kdb_enter() at kdb_enter+0x3d
> > >> panic() at panic+0x176
> > >> assert_mtx() at assert_mtx
> > >> vm_page_wire() at vm_page_wire+0x37
> > >> nv_alloc_system_pages() at nv_alloc_system_pages+0x217
> > >> nv_alloc_pages() at nv_alloc_pages+0xcd
> > >> _nv019978rm() at _nv019978rm+0x7f
> > >> 
> > >> nvidia-driver-256.35 and custom kernel:
> > >> panic: blockable sleep lock (sleep mutex) select mtxpool _at_
> > >> /home/freebsd9/src/sys/kern/sys_generic.c:1479
> > >> cpuid = 1
> > >> KDB: enter: panic
> > >> [ thread pid 1830 tid 100090 ]
> > >> Stopped at      kdb_enter+0x3d: movq    $0,0x51368c(%rip)
> > >> db> bt
> > >> Tracing pid 1830 tid 100090 td 0xffffff000456d3d0
> > >> kdb_enter() at kdb_enter+0x3d
> > >> panic() at panic+0x176
> > >> witness_checkorder() at witness_checkorder+0x913
> > >> _mtx_lock_flags() at _mtx_lock_flags+0x68
> > >> selrecord() at selrecord+0x71
> > >> nvidia_dev_poll() at nvidia_dev_poll+0x52
> > >> devfs_poll_f() at devfs_poll_f+0x55
> > >> kern_select() at kern_select+0x501
> > >> select() at select+0x54
> > >> syscallenter() at syscallenter+0x19b
> > >> syscall() at syscall+0x41
> > >> Xfast_syscall() at Xfast_syscall+0xe2
> > >> --- syscall (93, FreeBSD ELF64, select), rip = 0x801a17ddc, rsp =
> > >> 0x7fffffffe908, rbp = 0x100 ---
> > >> 
> > >> Also of note is:
> > >> # grep '^C.*FLAGS' /etc/make.conf
> > >> CFLAGS+= -DNDEBUG
> > >> 
> > >> As mentioned that without any debugging options the system is stable.
> > >> 
> > >> Is there anything I can do to assist diagnosis?
> > >> 
> > >> Regards,
> > >> 
> > >> David
> > > 
> > > http://lists.freebsd.org/pipermail/freebsd-current/2010-June/017936.html
> > > helps here, check the thread as well.
> > > 
> > > You could also try to use 256.35 driver.
> > 
> > The 256.35 driver works for me (without the above-referred patch), but
> > anywhere between 1 and 48 hours my laptop locks up hard without any
> > warning nor panic. This is with CURRENT r209581, GENERIC kernel, but with
> > debug.witness.watch=0 If I set debug.witness.watch to 1, the kernel
> > freezes when starting X.
> 
> I experienced a lockup when using the 256.35 driver, I switched back to the 
> 195.36.15 driver and no problems since.  The system also freezes up when 
> launching k3b so I'm not sure what caused that particular freeze...
> 
> Thanks for the debug.witness.watch hint.  

These freezes and panics are due to the driver using a spin mutex instead of a 
regular mutex for the per-file descriptor event_mtx.  If you patch the driver 
to change it to be a regular mutex I think that should fix the problems.

-- 
John Baldwin
Received on Thu Jul 08 2010 - 10:32:10 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:05 UTC