Re: Silent reboots in head _at_r248550 starting xdm with x11/nvidia-driver

From: Konstantin Belousov <kostikbel_at_gmail.com>
Date: Thu, 21 Mar 2013 15:58:35 +0200
On Thu, Mar 21, 2013 at 06:34:46AM -0700, David Wolfskill wrote:
> On Thu, Mar 21, 2013 at 10:04:41AM +0200, Konstantin Belousov wrote:
> > ...
> > This gives me an idea. The only so to say 'vm' change in r248508 was an
> > addition of the bio_transient_map submap. The vfs.unmapped_buf_allowed
> > tunable did not eliminated the submap creation. Please try r248569
> > with vfs.unmapped_buf_allowed set to 0.
> 
> OK; I believe that worked.
> 
> "Believe" because (in the normal course of things) I updated to:
> 
> FreeBSD g1-235.catwhisker.org 10.0-CURRENT FreeBSD 10.0-CURRENT #845  r248575M/248575: Thu Mar 21 05:35:06 PDT 2013     root_at_g1-235.catwhisker.org:/usr/obj/usr/src/sys/CANARY  i386
> 
> which is a little beyond r248569.  (I still have r248508 on a
> different slice, and figured I could update that to precisely r248569
> if this test was incorrect or inconclusive.)
Not needed. BTW, your system uses UFS, right ?

> 
> In any case: after booting the above (r248575) to verify that it worked
> as long as I did not load nvidia.ko first, I then rebooted, escaped to
> loader prompt, set vfs.unmapped_buf_allowed=0; boot.
> 
> And after that came up OK, I (manually) loaded nvidia.ko, then
> re-started X (xdm); the nVidia banner displayed just before the xdm
> login screen did.  (I have my xdm startup script "prefer" the nvidia
> driver, but if nvidia.ko isn't loaded, it reverts to the nv driver
> automagically.)
> 
> > If this combination allows the nvidia driver to start, please revert
> > the setting of vfs.unmapped_buf_allowed, and instead set
> > kern.bio_transient_maxcnt e.g. to 256 or even 128.
> 
> OK; rebooting, escaping to loader, *not* setting vfs.unmapped_buf_allowed,
> and setting kern.bio_transient_maxcnt=256 also allowed nvidia driver
> to be used at r248575.
Ok, this is almost not a workaround but a solution (for now). See below.

> 
> > Also, on the machine without the tunables customization, please show
> > the output of sysctl kern.nbuf, kern.bio_transient_maxcnt. Also show
> > the output of pciconf -lvb.
> 
> OK; I rebooted (to revert the vfs.unmapped_buf_allowed setting) and
> obtained the above (augmented a wee bit by some of the others
> mentioned; I've attached that as "sysctl.txt".  I've also attached
> a copy of dmesg.boot, in case that's useful.
> 
> I then tried rebooting r248575 and loading nvidia.ko *without* the
> tunable customization, and verified that I still saw (what looks
> like) a "reset" when I start X that way (as reported initially).
> 
> > From what I see in your report, you use i386 arch. What is the amount
> > of memory installed in the machine ?
> 
> 4GB.
> 
> Is the above what you had in mind, or would you like me to try at
> precisely r248569?  Anything else?
r248569 is fine.


> Script started on Thu Mar 21 06:07:41 2013
> g1-235(10.0-C)[1] uname -a
> FreeBSD g1-235.catwhisker.org 10.0-CURRENT FreeBSD 10.0-CURRENT #845  r248575M/248575: Thu Mar 21 05:35:06 PDT 2013     root_at_g1-235.catwhisker.org:/usr/obj/usr/src/sys/CANARY  i386
> g1-235(10.0-C)[2] sysctl vfs.unmapped_buf_allowed kern.bio_transient_maxcnt kern.nbuf
> vfs.unmapped_buf_allowed: 1
> kern.bio_transient_maxcnt: 697
> kern.nbuf: 7224
Could you, please, do some more measurements in the r248575M ?

Please show the kern.nbuf for vfs.unmapped_buf_allowed=0 case.
Also, from there, run "kgdb /boot/kernel/kernel /dev/mem" and do
p *buffer_map.

Reboot without applying any unmapped/transient tuning, run the kgdb
again, and do
p *buffer_map
p *bio_transient_map

Reboot with kern.bio_transient_maxcnt tunable set to 256 and again
print the buffer_map and bio_transient_map from the kgdb.

> none1_at_pci0:0:3:3:       class=0x070002 card=0x02501028 chip=0x2a478086 rev=0x07 hdr=0x00
>     vendor     = 'Intel Corporation'
>     device     = 'Mobile 4 Series Chipset AMT SOL Redirection'
>     class      = simple comms
>     subclass   = UART
>     bar   [10] = type I/O Port, range 32, base 0xef88, size 8, enabled
>     bar   [14] = type Memory, range 32, base 0xf6fda000, size 4096, enabled
Oh, you do have the serial port on your notebook, usable remotely without
serial cable. Your chipset seems to be AMT-capable, and you could use
comms/amtterm from other machine to get a serial console.

> vgapci0_at_pci0:1:0:0:     class=0x030000 card=0x02501028 chip=0x065c10de rev=0xa1 hdr=0x00
>     vendor     = 'NVIDIA Corporation'
>     device     = 'G96M [Quadro FX 770M]'
>     class      = display
>     subclass   = VGA
>     bar   [10] = type Memory, range 32, base 0xf5000000, size 16777216, enabled
>     bar   [14] = type Prefetchable Memory, range 64, base 0xe0000000, size 268435456, enabled
>     bar   [1c] = type Memory, range 64, base 0xf2000000, size 33554432, enabled
>     bar   [24] = type I/O Port, range 32, base 0xdf00, size 128, enabled

My current theory is that the nvidia aperture size is 256MB, as indicated
by bar at 14, and nvidia driver tries to map the whole aperture into KVA.

With 4GB of RAM and i386, available 1GB of the KVA become quite tightly
populated, and even small changes in the layout make the mapping of
256MB impossible. If I am right, this is more an issue with nvidia.

Still, the layout should have not changed much, if at all. I want the
kgdb information listed above to confirm/deny this.

If you could configure AMT SOL console, then my theory about nvidia mapping
the whole aperture could be confirmed or denied.

Thank you.

Received on Thu Mar 21 2013 - 12:58:52 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:35 UTC