Re: nvidia drivers mutex lock

From: Tomoaki AOKI <junchoon_at_dec.sakura.ne.jp>
Date: Sat, 10 Jun 2017 00:11:22 +0900
Hmm, now I now strongly suspect hardware or noise issue, as nvidia GPU
seems to fall / re-appear on bus for some times.

If it WAS a desktop one and GPU is attached via PCIe connector,
I'll immediately power off and re-connect the card, with some
physical dust cleaning, but this time the GPU is onboard...

 *Not shure, but possibly, too short timeout on driver initialization
  code can show problems like this (too short to initialize).


On Thu, 8 Jun 2017 02:27:51 +0800
blubee blubeeme <gurenchan_at_gmail.com> wrote:

> I was just looking through dmesg and noticed these:
> 
> Jun  6 21:40:52 blubee kernel: nvidia-modeset: Allocated GPU:0
> (GPU-54a7b304-c99d-efee-0117-0ce119063cd6) _at_ PCI:0000:01:00.0
> Jun  6 21:41:05 blubee kernel: NVRM: GPU at PCI:0000:01:00:
> GPU-54a7b304-c99d-efee-0117-0ce119063cd6
> Jun  6 21:41:05 blubee kernel: NVRM: GPU Board Serial Number:
> Jun  6 21:41:05 blubee kernel: NVRM: Xid (PCI:0000:01:00): 79, GPU has
> fallen off the bus.
> Jun  6 21:41:05 blubee kernel:
> Jun  6 21:41:05 blubee kernel: NVRM: GPU at 0000:01:00.0 has fallen off the
> bus.
> Jun  6 21:41:05 blubee kernel: NVRM: GPU is on Board .
> Jun  6 21:41:05 blubee kernel: NVRM: A GPU crash dump has been created. If
> possible, please run
> Jun  6 21:41:05 blubee kernel: NVRM: nvidia-bug-report.sh as root to
> collect this data before
> Jun  6 21:41:05 blubee kernel: NVRM: the NVIDIA kernel module is unloaded.
> Jun  6 21:41:05 blubee kernel: nvidia-modeset: ERROR: GPU:0: Failed to
> query display engine channel state: 0x0000927c:0:0:0x0000000f
> Jun  6 21:41:05 blubee kernel: nvidia-modeset: ERROR: GPU:0: Failed to
> query display engine channel state: 0x0000927c:0:0:0x0000000f
> Jun  6 21:41:05 blubee kernel: vgapci0: child nvidia0 requested
> pci_enable_io
> Jun  6 21:41:05 blubee kernel: nvidia-modeset: ERROR: GPU:0: Failed to
> query display engine channel state: 0x0000927c:0:0:0x0000000f
> Jun  6 21:41:06 blubee kernel: nvidia-modeset: ERROR: GPU:0: Failed to
> query display engine channel state: 0x0000927c:0:0:0x0000000f
> Jun  6 21:41:22 blubee kernel: .
> 
> then that lead me to this nvidia forum thread:
> https://devtalk.nvidia.com/default/topic/985037/gtx-1070-quot-gpu-has-fallen-off-the-bus-quot-running-3d-games-in-arch-linux-/
> 
> maybe it could help somehow?
> 
> Best,
> Owen
> 
> On Tue, Jun 6, 2017 at 10:08 PM, blubee blubeeme <gurenchan_at_gmail.com>
> wrote:
> 
> > This is getting out of hand. I can't even keep x going for ten minutes
> > sometimes.
> > I've tested all the suggestions in this thread and they just don't work.
> >
> > I have put out a print of sysctl hw. here : https://paste2.org/
> >
> > With this CPU: hw.model: Intel(R) Core(TM) i7-6700HQ CPU _at_ 2.60GHz
> > The bios on this laptop I can either set graphics to discrete or mshybrid.
> >
> > I've tried in the past to disable discrete and run mshybrid but that
> > always comes up with 0 screens found. Even just doing Xorg -configure.
> >
> > Anyone have some tips on disabling nvidia drivers, running this cpu with
> > igpu for a while?
> >
> > Best,
> > Owen
> >
> > On Sun, Jun 4, 2017, 18:11 blubee blubeeme <gurenchan_at_gmail.com> wrote:
> >
> >> Thanks a lot! I'll give it a shot in a bit.
> >>
> >> Best,
> >> Owen
> >>
> >> On Sun, Jun 4, 2017, 16:59 Tomoaki AOKI <junchoon_at_dec.sakura.ne.jp>
> >> wrote:
> >>
> >>> Yes. FreeBSD patches in x11/nvidia-drivers/files are applied as usual.
> >>>
> >>> But beware! Sometimes upstream changes make any of FreeBSD patches not
> >>> applicable (incorporating any of these, incompatible modifies, ...).
> >>>
> >>> For 381.22, current patchset applies and builds fine for me.
> >>>
> >>>
> >>> On Sun, 04 Jun 2017 08:04:50 +0000
> >>> blubee blubeeme <gurenchan_at_gmail.com> wrote:
> >>>
> >>> > I'm running with svn and I build by make.
> >>> > If in use these steps, the BSD related patches will be applied, etc?
> >>> >
> >>> > Best,
> >>> > Owen
> >>> >
> >>> > On Sun, Jun 4, 2017, 15:53 Tomoaki AOKI <junchoon_at_dec.sakura.ne.jp>
> >>> wrote:
> >>> >
> >>> > > Hi.
> >>> > >
> >>> > > Not in ports tree, but easily overridden by adding
> >>> > >
> >>> > >   DISTVERSION=381.22 -DNO_CHECKSUM
> >>> > >
> >>> > > on make command line. Makefile of x11/nvidia-driver has a mechanism
> >>> > > to do so for someone requires newer version (newer GPU support,
> >>> etc.).
> >>> > >
> >>> > > If you're using portupgrade,
> >>> > >
> >>> > >   portupgrade -m 'DISTVERSION=381.22 -DNO_CHECKSUM' -f
> >>> x11/nvidia-driver
> >>> > >
> >>> > > would do the same.
> >>> > >
> >>> > > If you installed it via pkg, there's no way to try. :-(
> >>> > > (As it's pre-built.)
> >>> > >
> >>> > >
> >>> > > On Sun, 04 Jun 2017 07:04:01 +0000
> >>> > > blubee blubeeme <gurenchan_at_gmail.com> wrote:
> >>> > >
> >>> > > > Hi _at_tomoaki
> >>> > > > Is that version of nvidia drivers currently in the ports tree? I
> >>> just
> >>> > > > checked but it seems not to be.
> >>> > > >
> >>> > > > _at_jeffrey
> >>> > > > I just generated a new xorg based on the force composition
> >>> setting. I
> >>> > > > merged it with my previous xorg I'll reboot, see if it gives better
> >>> > > > performance.
> >>> > > >
> >>> > > > It seems like my system is locking up more frequently now.
> >>> Sometimes
> >>> > > right
> >>> > > > after a reboot the system, the screen locks and it's reboot and
> >>> pray.
> >>> > > >
> >>> > > > Best,
> >>> > > > Owen
> >>> > > >
> >>> > > > On Sat, Jun 3, 2017, 21:59 Jeffrey Bouquet <
> >>> jeffreybouquet_at_yahoo.com>
> >>> > > wrote:
> >>> > > >
> >>> > > > > SOME LINES BOTTOM POSTED, SEE...
> >>> > > > > --------------------------------------------
> >>> > > > > On Fri, 6/2/17, Tomoaki AOKI <junchoon_at_dec.sakura.ne.jp> wrote:
> >>> > > > >
> >>> > > > >  Subject: Re: nvidia drivers mutex lock
> >>> > > > >  To: freebsd-current_at_freebsd.org
> >>> > > > >  Cc: "Jeffrey Bouquet" <jeffreybouquet_at_yahoo.com>, "blubee
> >>> blubeeme" <
> >>> > > > > gurenchan_at_gmail.com>
> >>> > > > >  Date: Friday, June 2, 2017, 11:25 PM
> >>> > > > >
> >>> > > > >  Hi.
> >>> > > > >  Version
> >>> > > > >  381.22 (5 days newer than 375.66) of the driver states...
> >>> > > > >  [1]
> >>> > > > >
> >>> > > > >   Fixed hangs and
> >>> > > > >  crashes that could occur when an OpenGL context is
> >>> > > > >   created while the system is out of available
> >>> > > > >  memory.
> >>> > > > >
> >>> > > > >  Can this be related
> >>> > > > >  with your hang?
> >>> > > > >
> >>> > > > >  IMHO,
> >>> > > > >  possibly allocating new resource (using os.lock_mtx
> >>> > > > >  guard)
> >>> > > > >  without checking the lock first while
> >>> > > > >  previous request is waiting for
> >>> > > > >  another can
> >>> > > > >  cause the duplicated lock situation. And high memory
> >>> > > > >  pressure would easily cause the situation.
> >>> > > > >
> >>> > > > >   [1] http://www.nvidia.com/Download
> >>> /driverResults.aspx/118527/en-us
> >>> > > > >
> >>> > > > >  Hope it helps.
> >>> > > > >
> >>> > > > >
> >>> > > > >  On Thu, 1 Jun
> >>> > > > >  2017 22:35:46 +0000 (UTC)
> >>> > > > >  Jeffrey Bouquet
> >>> > > > >  <jeffreybouquet_at_yahoo.com>
> >>> > > > >  wrote:
> >>> > > > >
> >>> > > > >  > I see the same
> >>> > > > >  message, upon load, ...
> >>> > > > >  >
> >>> > > > >  --------------------------------------------
> >>> > > > >  > On Thu, 6/1/17, blubee blubeeme <gurenchan_at_gmail.com>
> >>> > > > >  wrote:
> >>> > > > >  >
> >>> > > > >  >  Subject:
> >>> > > > >  nvidia drivers mutex lock
> >>> > > > >  >  To: freebsd-ports_at_freebsd.org,
> >>> > > > >  freebsd-current_at_freebsd.org
> >>> > > > >  >  Date: Thursday, June 1, 2017, 11:35
> >>> > > > >  AM
> >>> > > > >  >
> >>> > > > >  >  I'm
> >>> > > > >  running nvidia-drivers 375.66 with a GTX
> >>> > > > >  >  1070 on FreeBSD-Current
> >>> > > > >  >
> >>> > > > >  >  This problem
> >>> > > > >  just started happening
> >>> > > > >  >  recently but,
> >>> > > > >  every so often my laptop
> >>> > > > >  >  screen will
> >>> > > > >  just blank out and then I
> >>> > > > >  >  have to
> >>> > > > >  power cycle to get the
> >>> > > > >  >  machine up and
> >>> > > > >  running again.
> >>> > > > >  >
> >>> > > > >  >  It seems to be a problem with nvidia
> >>> > > > >  >  drivers acquiring duplicate lock. Any
> >>> > > > >  >  info on this?
> >>> > > > >  >
> >>> > > > >  >  Jun$B".(B 2 02:29:41 blubee kernel:
> >>> > > > >  >  acquiring duplicate lock of same
> >>> > > > >  type:
> >>> > > > >  >  "os.lock_mtx"
> >>> > > > >  >  Jun$B".(B 2 02:29:41 blubee kernel: 1st
> >>> > > > >  >  os.lock_mtx _at_ nvidia_os.c:841
> >>> > > > >  >  Jun$B".(B 2 02:29:41 blubee kernel: 2nd
> >>> > > > >  >  os.lock_mtx _at_ nvidia_os.c:841
> >>> > > > >  >  Jun$B".(B 2 02:29:41 blubee kernel:
> >>> > > > >  >  stack backtrace:
> >>> > > > >  >
> >>> > > > >  Jun$B".(B 2 02:29:41 blubee kernel: #0
> >>> > > > >  >
> >>> > > > >  0xffffffff80ab7770 at
> >>> > > > >  >
> >>> > > > >  witness_debugger+0x70
> >>> > > > >  >  Jun$B".(B 2
> >>> > > > >  02:29:41 blubee kernel: #1
> >>> > > > >  >
> >>> > > > >  0xffffffff80ab7663 at
> >>> > > > >  >
> >>> > > > >  witness_checkorder+0xe23
> >>> > > > >  >  Jun$B".(B 2
> >>> > > > >  02:29:41 blubee kernel: #2
> >>> > > > >  >
> >>> > > > >  0xffffffff80a35b93 at
> >>> > > > >  >
> >>> > > > >  __mtx_lock_flags+0x93
> >>> > > > >  >  Jun$B".(B 2
> >>> > > > >  02:29:41 blubee kernel: #3
> >>> > > > >  >
> >>> > > > >  0xffffffff82f4397b at
> >>> > > > >  >
> >>> > > > >  os_acquire_spinlock+0x1b
> >>> > > > >  >  Jun$B".(B 2
> >>> > > > >  02:29:41 blubee kernel: #4
> >>> > > > >  >
> >>> > > > >  0xffffffff82c48b15 at _nv012002rm+0x185
> >>> > > > >  >  Jun$B".(B 2 02:29:41 blubee kernel:
> >>> > > > >  >  ACPI Warning:
> >>> > > > >  \_SB.PCI0.PEG0.PEGP._DSM:
> >>> > > > >  >  Argument #4
> >>> > > > >  type mismatch - Found
> >>> > > > >  >  [Buffer], ACPI
> >>> > > > >  requires [Package]
> >>> > > > >  >
> >>> > > > >  (20170303/nsarguments-205)
> >>> > > > >  >  Jun$B".(B 2
> >>> > > > >  02:29:42 blubee kernel:
> >>> > > > >  >
> >>> > > > >  nvidia-modeset: Allocated GPU:0
> >>> > > > >  >
> >>> > > > >  (GPU-54a7b304-c99d-efee-0117-0ce119063cd6) _at_
> >>> > > > >  >  PCI:0000:01:00.0
> >>> > > > >  >
> >>> > > > >
> >>> > > > >  >  Best,
> >>> > > > >  >  Owen
> >>> > > > >  >
> >>> > > > >  _______________________________________________
> >>> > > > >  >  freebsd-ports_at_freebsd.org
> >>> > > > >  >  mailing list
> >>> > > > >  >  https://lists.freebsd.org/mailman/listinfo/freebsd-ports
> >>> > > > >  >  To unsubscribe, send any mail to
> >>> > > > >  "freebsd-ports-unsubscribe_at_freebsd.org"
> >>> > > > >  >
> >>> > > > >  >
> >>> > > > >  >
> >>> > > > >  > ... then Xorg will
> >>> > > > >  run happily twelve hours or so.  The lockups here happen
> >>> > > > >  usually
> >>> > > > >  > when too large or too many of
> >>> > > > >  number of tabs/ large web pages with complex CSS etc
> >>> > > > >  > are opened at a time.
> >>> > > > >  >     So no help, just a 'me
> >>> > > > >  too'.
> >>> > > > >  >
> >>> > > > >  _______________________________________________
> >>> > > > >  > freebsd-current_at_freebsd.org
> >>> > > > >  mailing list
> >>> > > > >  > https://lists.freebsd.org/mailman/listinfo/freebsd-current
> >>> > > > >  >
> >>> > > > >  To unsubscribe, send any mail to "
> >>> > > freebsd-current-unsubscribe_at_freebsd.org
> >>> > > > > "
> >>> > > > >  >
> >>> > > > >  >
> >>> > > > >
> >>> > > > >
> >>> > > > >  --
> >>> > > > >  Tomoaki
> >>> > > > >  AOKI    <junchoon_at_dec.sakura.ne.jp>
> >>> > > > >
> >>> > > > >
> >>> > > > >
> >>> > > > > ........................
> >>> > > > > might be a workaround
> >>> > > > > Xorg/nvidia ran all night with this:
> >>> > > > >    nvidia-settings >>  X server display configuration >>
> >>> Advanced >>
> >>> > > Force
> >>> > > > > Full Composition Pipeline
> >>> > > > > ... for the laptop freezing.  Could not hurt to try.  " merge
> >>> with
> >>> > > > > Xorg.conf " from nvidia-settings...
> >>> > > > > ......................
> >>> > > > > 18 hours uptime so far, even past
> >>> > > > > the 3 am periodic scripts.   Have not rebooted out of the Xorg
> >>> though
> >>> > > so
> >>> > > > > may require edit-out of
> >>> > > > > xorg.conf if that is the case, in other words differing from
> >>> real-time
> >>> > > > > apply and
> >>> > > > > xorg initially start applies.
> >>> > > > > ........
> >>> > > > >
> >>> > > > >
> >>> > > > _______________________________________________
> >>> > > > freebsd-current_at_freebsd.org mailing list
> >>> > > > https://lists.freebsd.org/mailman/listinfo/freebsd-current
> >>> > > > To unsubscribe, send any mail to "
> >>> > > freebsd-current-unsubscribe_at_freebsd.org"
> >>> > > >
> >>> > > >
> >>> > >
> >>> > >
> >>> > > --
> >>> > > Tomoaki AOKI    <junchoon_at_dec.sakura.ne.jp>
> >>> > >
> >>>
> >>>
> >>> --
> >>> Tomoaki AOKI    <junchoon_at_dec.sakura.ne.jp>
> >>>
> >>
> _______________________________________________
> freebsd-current_at_freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"
> 
> 


-- 
Tomoaki AOKI    <junchoon_at_dec.sakura.ne.jp>
Received on Fri Jun 09 2017 - 13:11:25 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:12 UTC