Re: Silent reboots in head _at_r248550 starting xdm with x11/nvidia-driver

From: Shawn Webb <lattera_at_gmail.com>
Date: Thu, 21 Mar 2013 12:04:19 -0400
On Thu, Mar 21, 2013 at 9:58 AM, Konstantin Belousov <kostikbel_at_gmail.com>wrote:

> On Thu, Mar 21, 2013 at 06:34:46AM -0700, David Wolfskill wrote:
> > On Thu, Mar 21, 2013 at 10:04:41AM +0200, Konstantin Belousov wrote:
> > > ...
> > > This gives me an idea. The only so to say 'vm' change in r248508 was an
> > > addition of the bio_transient_map submap. The vfs.unmapped_buf_allowed
> > > tunable did not eliminated the submap creation. Please try r248569
> > > with vfs.unmapped_buf_allowed set to 0.
> >
> > OK; I believe that worked.
> >
> > "Believe" because (in the normal course of things) I updated to:
> >
> > FreeBSD g1-235.catwhisker.org 10.0-CURRENT FreeBSD 10.0-CURRENT #845
>  r248575M/248575: Thu Mar 21 05:35:06 PDT 2013
> root_at_g1-235.catwhisker.org:/usr/obj/usr/src/sys/CANARY  i386
> >
> > which is a little beyond r248569.  (I still have r248508 on a
> > different slice, and figured I could update that to precisely r248569
> > if this test was incorrect or inconclusive.)
> Not needed. BTW, your system uses UFS, right ?
>
> >
> > In any case: after booting the above (r248575) to verify that it worked
> > as long as I did not load nvidia.ko first, I then rebooted, escaped to
> > loader prompt, set vfs.unmapped_buf_allowed=0; boot.
> >
> > And after that came up OK, I (manually) loaded nvidia.ko, then
> > re-started X (xdm); the nVidia banner displayed just before the xdm
> > login screen did.  (I have my xdm startup script "prefer" the nvidia
> > driver, but if nvidia.ko isn't loaded, it reverts to the nv driver
> > automagically.)
> >
> > > If this combination allows the nvidia driver to start, please revert
> > > the setting of vfs.unmapped_buf_allowed, and instead set
> > > kern.bio_transient_maxcnt e.g. to 256 or even 128.
> >
> > OK; rebooting, escaping to loader, *not* setting
> vfs.unmapped_buf_allowed,
> > and setting kern.bio_transient_maxcnt=256 also allowed nvidia driver
> > to be used at r248575.
> Ok, this is almost not a workaround but a solution (for now). See below.
>
> >
> > > Also, on the machine without the tunables customization, please show
> > > the output of sysctl kern.nbuf, kern.bio_transient_maxcnt. Also show
> > > the output of pciconf -lvb.
> >
> > OK; I rebooted (to revert the vfs.unmapped_buf_allowed setting) and
> > obtained the above (augmented a wee bit by some of the others
> > mentioned; I've attached that as "sysctl.txt".  I've also attached
> > a copy of dmesg.boot, in case that's useful.
> >
> > I then tried rebooting r248575 and loading nvidia.ko *without* the
> > tunable customization, and verified that I still saw (what looks
> > like) a "reset" when I start X that way (as reported initially).
> >
> > > From what I see in your report, you use i386 arch. What is the amount
> > > of memory installed in the machine ?
> >
> > 4GB.
> >
> > Is the above what you had in mind, or would you like me to try at
> > precisely r248569?  Anything else?
> r248569 is fine.
>
>
> > Script started on Thu Mar 21 06:07:41 2013
> > g1-235(10.0-C)[1] uname -a
> > FreeBSD g1-235.catwhisker.org 10.0-CURRENT FreeBSD 10.0-CURRENT #845
>  r248575M/248575: Thu Mar 21 05:35:06 PDT 2013
> root_at_g1-235.catwhisker.org:/usr/obj/usr/src/sys/CANARY  i386
> > g1-235(10.0-C)[2] sysctl vfs.unmapped_buf_allowed
> kern.bio_transient_maxcnt kern.nbuf
> > vfs.unmapped_buf_allowed: 1
> > kern.bio_transient_maxcnt: 697
> > kern.nbuf: 7224
> Could you, please, do some more measurements in the r248575M ?
>
> Please show the kern.nbuf for vfs.unmapped_buf_allowed=0 case.
> Also, from there, run "kgdb /boot/kernel/kernel /dev/mem" and do
> p *buffer_map.
>
> Reboot without applying any unmapped/transient tuning, run the kgdb
> again, and do
> p *buffer_map
> p *bio_transient_map
>
> Reboot with kern.bio_transient_maxcnt tunable set to 256 and again
> print the buffer_map and bio_transient_map from the kgdb.
>
> > none1_at_pci0:0:3:3:       class=0x070002 card=0x02501028 chip=0x2a478086
> rev=0x07 hdr=0x00
> >     vendor     = 'Intel Corporation'
> >     device     = 'Mobile 4 Series Chipset AMT SOL Redirection'
> >     class      = simple comms
> >     subclass   = UART
> >     bar   [10] = type I/O Port, range 32, base 0xef88, size 8, enabled
> >     bar   [14] = type Memory, range 32, base 0xf6fda000, size 4096,
> enabled
> Oh, you do have the serial port on your notebook, usable remotely without
> serial cable. Your chipset seems to be AMT-capable, and you could use
> comms/amtterm from other machine to get a serial console.
>
> > vgapci0_at_pci0:1:0:0:     class=0x030000 card=0x02501028 chip=0x065c10de
> rev=0xa1 hdr=0x00
> >     vendor     = 'NVIDIA Corporation'
> >     device     = 'G96M [Quadro FX 770M]'
> >     class      = display
> >     subclass   = VGA
> >     bar   [10] = type Memory, range 32, base 0xf5000000, size 16777216,
> enabled
> >     bar   [14] = type Prefetchable Memory, range 64, base 0xe0000000,
> size 268435456, enabled
> >     bar   [1c] = type Memory, range 64, base 0xf2000000, size 33554432,
> enabled
> >     bar   [24] = type I/O Port, range 32, base 0xdf00, size 128, enabled
>
> My current theory is that the nvidia aperture size is 256MB, as indicated
> by bar at 14, and nvidia driver tries to map the whole aperture into KVA.
>
> With 4GB of RAM and i386, available 1GB of the KVA become quite tightly
> populated, and even small changes in the layout make the mapping of
> 256MB impossible. If I am right, this is more an issue with nvidia.
>
> Still, the layout should have not changed much, if at all. I want the
> kgdb information listed above to confirm/deny this.
>
> If you could configure AMT SOL console, then my theory about nvidia mapping
> the whole aperture could be confirmed or denied.
>
> Thank you.
>

I appear to be experiencing the same issue. I've been following this
thread. I have a coredump, but it's over 700mb in size. What would be the
best way to get that to you guys? The revision I'm at is r248583 on amd64
(6GB RAM) with an NVIDIA Quadro FX 580. Relevant lines from `pciconf -lvb`:

vgapci0_at_pci0:3:0:0:     class=0x030000 card=0x063a10de chip=0x065910de
rev=0xa1 hdr=0x00
    vendor     = 'NVIDIA Corporation'
    device     = 'G96 [Quadro FX 580]'
    class      = display
    subclass   = VGA
    bar   [10] = type Memory, range 32, base 0xf6000000, size 16777216,
enabled
    bar   [14] = type Prefetchable Memory, range 64, base 0xc0000000, size
536870912, enabled
    bar   [1c] = type Memory, range 64, base 0xf4000000, size 33554432,
enabled
    bar   [24] = type I/O Port, range 32, base 0xdc80, size 128, enabled

Thanks,

Shawn
Received on Thu Mar 21 2013 - 15:04:26 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:35 UTC