Re: nvidia-driver crashing kernel on head

From: David Naylor <naylor.b.david_at_gmail.com>
Date: Fri, 23 Jul 2010 14:08:36 +0200
On Saturday 17 July 2010 17:25:27 Christian Zander wrote:
> On Sat, Jul 17, 2010 at 07:24:54AM -0700, David Naylor wrote:
> (...)
> 
> > > >>> These freezes and panics are due to the driver using a spin mutex
> > > >>> instead of a
> > > >>> regular mutex for the per-file descriptor event_mtx.  If you patch
> > > >>> the driver
> > > >>> to change it to be a regular mutex I think that should fix the
> > > >>> problems.
> > > >> 
> > > >> Can you give an example? :) I don't mind creating a patch for all of
> > > >> them if you can illustrate what needs to be changed.
> > > > 
> > > > See the attached patch
> > > 
> > > In order to use 195.36.15 it was necessary to use the patch Rene sent,
> > > the suggestion from jhb previously to remove some locks, plus a bit
> > > more. The patch that got it working on HEAD for me (specifically
> > > r209633) is attached. With that patch I could start X, and run it for a
> > > while, but performance was very poor, even in comparison with the stock
> > > nv driver, and it crashed a couple times (although not nearly as bad as
> > > previously).
> > > 
> > > So based on other suggestions I tried the newest release version at
> > > nvidia, 256.35. Some of the same locking stuff was needed to patch it,
> > > a patch for the port which includes the locking patch is also
> > > attached. If you are running an amd64 system you'll have to type 'make
> > > makesum' after applying this patch to the port. I'm not sure this
> > > patch is complete, or what Alexey might want to do with the update,
> > > but it does create an accurate plist which means you can cleanly
> > > deinstall/pkg_delete when you're done.
> > > 
> > > With 256.35 performance and stability have both been quite good,
> > > comparable even to before the the drama started. The only concern I
> > > have at this point is that I'm periodically getting a strange sort of
> > > "flash" popping up on my screen that I didn't get while I was running
> > > the nv driver recently. It looks sort of like the default X background
> > > (the tiny gray crosshatch) is popping through for just a split second.
> > 
> > I've been getting these messages on the console:
> > 
> > NVRM: Xid (0001:00): 16, Head 00000000 Count 000218d5
> > NVRM: Xid (0001:00): 8, Channel 00000000
> > NVRM: Xid (0001:00): 16, Head 00000000 Count 000218d6
> > NVRM: Xid (0001:00): 8, Channel 00000002
> > 
> > This is preceded by X locking hard.  I cannot VT switch to a normal
> > console and sometimes the computer needs a hard reset (i.e. does not
> > respond to power button).  It appears to only trigger when under heavy
> > load.  eg
> > make -C /usr/src -j8 buildworld
> > 
> > This seems to be messing with interrupts with other subsystems as my
> > network drivers are less than reliable of late.  (Watchdog timeouts).
> 
> The messages indicate that the NVIDIA driver hasn't received
> interrupts from the GPU _at_ PCI:1:00.0 over a significant
> period of time. If you are seeing similar problems with other
> system components, there's a good chance that the above is
> a symptom of some larger problem.

I think you are right.  I'm not sure if this is a hardware problem or FreeBSD.  
I reverted to a kernel from May 01 and the system is solid (~5 days).  I'm 
using the patched 256.35 driver without problem.  

> > This happens with 195.36.15 unpatched and 256.35 patched.
> > 
> > I have not checked if booting with WITNESS enabled works.
> > 
> > Regards
> > 
> > * David Naylor <naylor.b.david_at_gmail.com>
> > * 0xFF6916B2

Received on Fri Jul 23 2010 - 10:08:46 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:05 UTC