Re: Silent reboots in head _at_r248550 starting xdm with x11/nvidia-driver

From: Shawn Webb <lattera_at_gmail.com>
Date: Thu, 21 Mar 2013 12:12:07 -0400
On Thu, Mar 21, 2013 at 12:04 PM, Shawn Webb <lattera_at_gmail.com> wrote:

> On Thu, Mar 21, 2013 at 9:58 AM, Konstantin Belousov <kostikbel_at_gmail.com>wrote:
>
>> On Thu, Mar 21, 2013 at 06:34:46AM -0700, David Wolfskill wrote:
>> > On Thu, Mar 21, 2013 at 10:04:41AM +0200, Konstantin Belousov wrote:
>> > > ...
>> > > This gives me an idea. The only so to say 'vm' change in r248508 was
>> an
>> > > addition of the bio_transient_map submap. The vfs.unmapped_buf_allowed
>> > > tunable did not eliminated the submap creation. Please try r248569
>> > > with vfs.unmapped_buf_allowed set to 0.
>> >
>> > OK; I believe that worked.
>> >
>> > "Believe" because (in the normal course of things) I updated to:
>> >
>> > FreeBSD g1-235.catwhisker.org 10.0-CURRENT FreeBSD 10.0-CURRENT #845
>>  r248575M/248575: Thu Mar 21 05:35:06 PDT 2013
>> root_at_g1-235.catwhisker.org:/usr/obj/usr/src/sys/CANARY  i386
>> >
>> > which is a little beyond r248569.  (I still have r248508 on a
>> > different slice, and figured I could update that to precisely r248569
>> > if this test was incorrect or inconclusive.)
>> Not needed. BTW, your system uses UFS, right ?
>>
>> >
>> > In any case: after booting the above (r248575) to verify that it worked
>> > as long as I did not load nvidia.ko first, I then rebooted, escaped to
>> > loader prompt, set vfs.unmapped_buf_allowed=0; boot.
>> >
>> > And after that came up OK, I (manually) loaded nvidia.ko, then
>> > re-started X (xdm); the nVidia banner displayed just before the xdm
>> > login screen did.  (I have my xdm startup script "prefer" the nvidia
>> > driver, but if nvidia.ko isn't loaded, it reverts to the nv driver
>> > automagically.)
>> >
>> > > If this combination allows the nvidia driver to start, please revert
>> > > the setting of vfs.unmapped_buf_allowed, and instead set
>> > > kern.bio_transient_maxcnt e.g. to 256 or even 128.
>> >
>> > OK; rebooting, escaping to loader, *not* setting
>> vfs.unmapped_buf_allowed,
>> > and setting kern.bio_transient_maxcnt=256 also allowed nvidia driver
>> > to be used at r248575.
>> Ok, this is almost not a workaround but a solution (for now). See below.
>>
>> >
>> > > Also, on the machine without the tunables customization, please show
>> > > the output of sysctl kern.nbuf, kern.bio_transient_maxcnt. Also show
>> > > the output of pciconf -lvb.
>> >
>> > OK; I rebooted (to revert the vfs.unmapped_buf_allowed setting) and
>> > obtained the above (augmented a wee bit by some of the others
>> > mentioned; I've attached that as "sysctl.txt".  I've also attached
>> > a copy of dmesg.boot, in case that's useful.
>> >
>> > I then tried rebooting r248575 and loading nvidia.ko *without* the
>> > tunable customization, and verified that I still saw (what looks
>> > like) a "reset" when I start X that way (as reported initially).
>> >
>> > > From what I see in your report, you use i386 arch. What is the amount
>> > > of memory installed in the machine ?
>> >
>> > 4GB.
>> >
>> > Is the above what you had in mind, or would you like me to try at
>> > precisely r248569?  Anything else?
>> r248569 is fine.
>>
>>
>> > Script started on Thu Mar 21 06:07:41 2013
>> > g1-235(10.0-C)[1] uname -a
>> > FreeBSD g1-235.catwhisker.org 10.0-CURRENT FreeBSD 10.0-CURRENT #845
>>  r248575M/248575: Thu Mar 21 05:35:06 PDT 2013
>> root_at_g1-235.catwhisker.org:/usr/obj/usr/src/sys/CANARY  i386
>> > g1-235(10.0-C)[2] sysctl vfs.unmapped_buf_allowed
>> kern.bio_transient_maxcnt kern.nbuf
>> > vfs.unmapped_buf_allowed: 1
>> > kern.bio_transient_maxcnt: 697
>> > kern.nbuf: 7224
>> Could you, please, do some more measurements in the r248575M ?
>>
>> Please show the kern.nbuf for vfs.unmapped_buf_allowed=0 case.
>> Also, from there, run "kgdb /boot/kernel/kernel /dev/mem" and do
>> p *buffer_map.
>>
>> Reboot without applying any unmapped/transient tuning, run the kgdb
>> again, and do
>> p *buffer_map
>> p *bio_transient_map
>>
>> Reboot with kern.bio_transient_maxcnt tunable set to 256 and again
>> print the buffer_map and bio_transient_map from the kgdb.
>>
>> > none1_at_pci0:0:3:3:       class=0x070002 card=0x02501028 chip=0x2a478086
>> rev=0x07 hdr=0x00
>> >     vendor     = 'Intel Corporation'
>> >     device     = 'Mobile 4 Series Chipset AMT SOL Redirection'
>> >     class      = simple comms
>> >     subclass   = UART
>> >     bar   [10] = type I/O Port, range 32, base 0xef88, size 8, enabled
>> >     bar   [14] = type Memory, range 32, base 0xf6fda000, size 4096,
>> enabled
>> Oh, you do have the serial port on your notebook, usable remotely without
>> serial cable. Your chipset seems to be AMT-capable, and you could use
>> comms/amtterm from other machine to get a serial console.
>>
>> > vgapci0_at_pci0:1:0:0:     class=0x030000 card=0x02501028 chip=0x065c10de
>> rev=0xa1 hdr=0x00
>> >     vendor     = 'NVIDIA Corporation'
>> >     device     = 'G96M [Quadro FX 770M]'
>> >     class      = display
>> >     subclass   = VGA
>> >     bar   [10] = type Memory, range 32, base 0xf5000000, size 16777216,
>> enabled
>> >     bar   [14] = type Prefetchable Memory, range 64, base 0xe0000000,
>> size 268435456, enabled
>> >     bar   [1c] = type Memory, range 64, base 0xf2000000, size 33554432,
>> enabled
>> >     bar   [24] = type I/O Port, range 32, base 0xdf00, size 128, enabled
>>
>> My current theory is that the nvidia aperture size is 256MB, as indicated
>> by bar at 14, and nvidia driver tries to map the whole aperture into KVA.
>>
>> With 4GB of RAM and i386, available 1GB of the KVA become quite tightly
>> populated, and even small changes in the layout make the mapping of
>> 256MB impossible. If I am right, this is more an issue with nvidia.
>>
>> Still, the layout should have not changed much, if at all. I want the
>> kgdb information listed above to confirm/deny this.
>>
>> If you could configure AMT SOL console, then my theory about nvidia
>> mapping
>> the whole aperture could be confirmed or denied.
>>
>> Thank you.
>>
>
> I appear to be experiencing the same issue. I've been following this
> thread. I have a coredump, but it's over 700mb in size. What would be the
> best way to get that to you guys? The revision I'm at is r248583 on amd64
> (6GB RAM) with an NVIDIA Quadro FX 580. Relevant lines from `pciconf -lvb`:
>
> vgapci0_at_pci0:3:0:0:     class=0x030000 card=0x063a10de chip=0x065910de
> rev=0xa1 hdr=0x00
>     vendor     = 'NVIDIA Corporation'
>     device     = 'G96 [Quadro FX 580]'
>     class      = display
>     subclass   = VGA
>     bar   [10] = type Memory, range 32, base 0xf6000000, size 16777216,
> enabled
>     bar   [14] = type Prefetchable Memory, range 64, base 0xc0000000, size
> 536870912, enabled
>     bar   [1c] = type Memory, range 64, base 0xf4000000, size 33554432,
> enabled
>     bar   [24] = type I/O Port, range 32, base 0xdc80, size 128, enabled
>
>
Looks like setting both vfs.unmapped_buf_allowed=0 and
kern.bio_transient_maxcnt=512 worked for me. I'm now up and running
smoothly.
Received on Thu Mar 21 2013 - 15:12:08 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:35 UTC