nvidia driver on recent current?

From: Craig Boston <cb_at_severious.net> Date: Fri, 21 Sep 2007 15:25:24 -0500 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:18 UTC

I've been getting a lot of these panics on recent -current builds that
are being caused by the nvidia driver:

panic: spin locks can only use msleep_spin

I managed to compile the part of the driver that there is source code
for with debug symbols, but the only thing that's showing up in the
stack trace are obfuscated function names from the binary module.  Some
of the addresses look very suspicious, so it seems the stack is likely
corrupted.

(the nvidia module exists at 0xc0c2e000 - 0xc13cf000, the ones in the
0xc590+ range don't seem to correspond to any loaded module)

#3  0xc06de0e2 in unlock_spin (lock=Could not find the frame base for "unlock_spin".
) at /compile/src/sys/kern/kern_mutex.c:166
#4  0xc06b46fb in _cv_wait (cvp=0xc59d6a18, lock=0xc59d6a00)
    at /compile/src/sys/kern/kern_condvar.c:131
#5  0xc1012c71 in ?? ()
#6  0xc59d6a18 in ?? ()
#7  0xc59d6a00 in ?? ()
#8  0xc110a9b0 in ?? ()
#9  0x00000273 in ?? ()
#10 0xc5a17000 in ?? ()
#11 0xc593e800 in ?? ()
#12 0xff78e84c in ?? ()
#13 0xc0ce5316 in _nv009651rm ()
#14 0xc59d6a00 in ?? ()
#15 0x20000000 in ?? ()
#16 0x00000028 in ?? ()
#17 0xc5a2dd00 in ?? ()
#18 0xff78e86c in ?? ()
#19 0xc9c8e8e0 in ?? ()
#20 0xff78e86c in ?? ()
#21 0xc0cedda4 in _nv009831rm ()
#22 0xc59d6a00 in ?? ()
#23 0x00000001 in ?? ()
#24 0x00000000 in ?? ()
#25 0xc9c8e8e0 in ?? ()
#26 0xc9c8e8e0 in ?? ()
#27 0xc5a2dc00 in ?? ()
#28 0xff78e88c in ?? ()
#29 0xc1014ca8 in ?? ()
#30 0x00000000 in ?? ()
#31 0xc5a2dd00 in ?? ()
#32 0xc9332b00 in ?? ()
#33 0xc5a2dc00 in ?? ()
#34 0xc5a2dd00 in ?? ()
#35 0xc5a2de00 in ?? ()
#36 0xff78e8ac in ?? ()
#37 0xc1011e0b in ?? ()
#38 0xc5a2dc00 in ?? ()
#39 0xc5a2de00 in ?? ()
#40 0xd7769800 in ?? ()
#41 0xc5a2de00 in ?? ()
#42 0xca61faa0 in ?? ()
#43 0xc1365f60 in ?? ()
#44 0xff78e8cc in ?? ()
#45 0xc06b6c13 in giant_close (dev=0xc59d6a00, fflag=536870912, devtype=40, 
    td=0xc5a2dd00) at /compile/src/sys/kern/kern_conf.c:327

(kgdb) up 3
166             panic("spin locks can only use msleep_spin");
(kgdb) print lock
Could not find the frame base for "unlock_spin".
(kgdb) up 1
#4  0xc06b46fb in _cv_wait (cvp=0xc59d6a18, lock=0xc59d6a00)
    at /compile/src/sys/kern/kern_condvar.c:131
131             lock_state = class->lc_unlock(lock);

(kgdb) print *lock
$8 = {lo_name = 0xc110a995 "rm.mutex_mtx", 
  lo_type = 0xc110a995 "rm.mutex_mtx", lo_flags = 720896, lo_witness_data = {
    lod_list = {stqe_next = 0x0}, lod_witness = 0x0}}

(kgdb) print *class
$10 = {lc_name = 0xc095aba9 "spin mutex", lc_flags = 10, lc_ddb_show = 0, 
  lc_lock = 0xc06de0f0 <lock_spin>, lc_unlock = 0xc06de0d0 <unlock_spin>}

rm.mutex_mtx is indeed created in nvidia_os.c with
mtx_init(&mtx->mutex_mtx, "rm.mutex_mtx", NULL, MTX_SPIN | MTX_RECURSE);

I don't see any explicit calls to unlock_spin in the part we have source
for, just mtx_unlock_spin.  I'm unsure why the spin mutex class has
pointers to these dummy functions that simply panic, but I'm not very
well versed on the internals of kernel lock primitives.

Any suggestions?  I'm not sure if this is an nvidia problem that we need
to refer to them or if a change in the kernel has broken something it
depends on.

Craig