Re: r343567 aka PAE vs non-PAE merge breaks i386 freebsd

From: Konstantin Belousov <kostikbel_at_gmail.com>
Date: Sat, 23 Feb 2019 20:23:59 +0200
On Sat, Feb 23, 2019 at 10:04:07AM -0800, Rodney W. Grimes wrote:
> > On Sat, Feb 23, 2019 at 11:19:31AM +0200, Konstantin Belousov wrote:
> > > On Fri, Feb 22, 2019 at 07:26:44PM -0800, Steve Kargl wrote:
> > > > On Thu, Feb 21, 2019 at 10:04:10PM -0800, Steve Kargl wrote:
> > > > > On Thu, Feb 21, 2019 at 07:39:25PM -0800, Steve Kargl wrote:
> > > > > > r343567 merges the PAE vs non-PAE pmap headers for i386
> > > > > > freebsd.  After bisection and dealing with the drm-legacy-kmod
> > > > > > fallout, I bisected /usr/src to r343567.  Building world and
> > > > > > a GENERIC kernel and the minimum set of ports to start Xorg
> > > > > > on my Dell Latitude D530 laptop, results in a black screen
> > > > > > of death and a locked up laptop (no keyboard, mouse, or video).
> > > > > > 
> > > > > > A comparison of /etc/log/Xorg.0.log for r343566 (Xorg loads
> > > > > > and functions) and r353467 (Xorg black screen of death) shows
> > > > > > that /boot/modules/i915kms.ko loads correctly as the log
> > > > > > files are identical.
> > > > > > 
> > > > > > Comparing dmesg for r343566 to r343567 shows the following
> > > > > >  
> > > > > > --- dmesg.343566	2019-02-20 08:13:07.727202000 -0800
> > > > > > +++ dmesg.343567	2019-02-21 19:02:24.469562000 -0800
> > > > > > _at__at_ -3,11 +3,11 _at__at_
> > > > > >  Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
> > > > > >  	The Regents of the University of California. All rights reserved.
> > > > > >  FreeBSD is a registered trademark of The FreeBSD Foundation.
> > > > > > -FreeBSD 13.0-CURRENT r343566 GENERIC i386
> > > > > > +FreeBSD 13.0-CURRENT r343567 GENERIC i386
> > > > > >  FreeBSD clang version 7.0.1 (tags/RELEASE_701/final 349250) (based on LLVM 7.0.1)
> > > > > >  WARNING: WITNESS option enabled, expect reduced performance.
> > > > > >  VT(vga): resolution 640x480
> > > > > > -CPU: Intel(R) Core(TM)2 Duo CPU     T7250  _at_ 2.00GHz (1995.05-MHz 686-class CPU)
> > > > > > +CPU: Intel(R) Core(TM)2 Duo CPU     T7250  _at_ 2.00GHz (1995.04-MHz 686-class CPU)
> > > > > >    Origin="GenuineIntel"  Id=0x6fd  Family=0x6  Model=0xf  Stepping=13
> > > > > >    Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
> > > > > >    Features2=0xe3bd<SSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM>
> > > > > > _at__at_ -16,7 +16,7 _at__at_
> > > > > >    VT-x: (disabled in BIOS) HLT,PAUSE
> > > > > >    TSC: P-state invariant, performance statistics
> > > > > >  real memory  = 4294967296 (4096 MB)
> > > > > > -avail memory = 3639914496 (3471 MB)
> > > > > > +avail memory = 4154175488 (3961 MB)
> > > > > > 
> > > > > > Somehow the r343567 kernel found an addition 490 MB of memory,
> > > > > > which leads me to believe the after loading i915kms.ko there
> > > > > > is some serious memory stomping issues.
> > > > > > 
> > > > > > I willing to do whatever is necessary to fix this issue (shorter
> > > > > > of mailing the laptop to someone).  Is it possible to revert
> > > > > > r343567 and move forward? 
> > > > > > 
> > > > > 
> > > > > More info from sysctl.  With the "good" r343566, I see
> > > > > 
> > > > > vm.kmem_map_free: 1187033088
> > > > > vm.kmem_map_size: 27234304
> > > > > vm.kmem_size_scale: 3
> > > > > vm.kmem_size_max: 1715470336
> > > > > vm.kmem_size_min: 12582912
> > > > > vm.kmem_zmax: 65536
> > > > > vm.kmem_size: 1214267392
> > > > > hw.physmem: 3714269184
> > > > > hw.usermem: 3650867200
> > > > > hw.realmem: 4294963200
> > > > > 
> > > > > With the problematic r343567, I see
> > > > > 
> > > > > vm.kmem_map_free: 1683152896
> > > > > vm.kmem_map_size: 28123136
> > > > > vm.kmem_size_scale: 1
> > > > > vm.kmem_size_max: 1711276032
> > > > > vm.kmem_size_min: 12582912
> > > > > vm.kmem_zmax: 65536
> > > > > vm.kmem_size: 1711276032
> > > > > hw.physmem: 4252360704
> > > > > hw.usermem: 4146999296
> > > > > hw.realmem: 4294963200
> > > > > 
> > > > > Ideas?
> > > > > 
> > > > 
> > > > Here's the 'diff -uw' between a verbose dmesg boot of r343566
> > > > and dmesg boot of r343567.  The memory size looks rather puzzling.
> > > > Can the people responsible for the i386 pmap.h merging take a
> > > > look?
> > > What is puzzling ?
> > 
> > Supposely, the laptop only has 4 GB of memory.  Not sure how
> > it finds memory above 4 GB.
> 
> It probably has what is called UMA and the graphics
> framebuffer is mapped into memory below 4G and the
> original memory is mapped above 4G, giving you this
> little bit of >4G memory that is trigger PAE now.
The PCI window takes between 300M up to 1G, so if 4G of RAM is installed,
the same amount is wasted because it is remapped above 4G, regardless
of the CPU bitness.

> 
> This may not be desired, is there any performance
> advantage to not turning on PAE in this situation?
PAE enables nx bit, this was the main reason for the PAE commit.

> 
> > I build 343566 and minimum ports needed for Xorg including
> > drm-legacy-kmod.  I can load xorg, and in fact, I am typing
> > this email now on the laptop with vi in xterm.
> > 
> > I build 343567 and minimum ports needed for Xorg including
> > drm-legacy-kmod.  I try to start Xorg.  Black screen of death.
> > No mouse.  No keyboard.  Just a hard reset.
> 
> That would be a regression caused by PAE coming into play.
> 
> > I build 343567 and minimum ports needed for Xorg including
> > drm-legacy-kmod.  I load i915kms.ko, do not start Xorg.  There
> > are surprising strikes/blotches of color on screen.  Building any
> > port with the system's cc results in occasion segfaults. 
> > 
> > > When kernel boots in PAE mode, it can (and will) get a use for physical
> > > memory mapped above 4G.  I highlighted the SMAP entry which represents
> > > such memory, below.
> > > 
> > > kmem_scale was changed in the PAE commit, see the commit message for
> > > explanation.
> > > 
> > 
> > I read it multiple times.  It does not explain how to get the
> > old pre-343567 behavior where the laptop is usable.  It mentions
> > two new sysctl entities.  One is irrelevant as I don't have 24+ GB
> > of memory.  The other has this in the commit message:
> > 
> >     There are two tunables added: hw.above4g_allow and ...,
> 
> I think trying to set that sysctl to 0 in a post 343567 system
> is worth a try.
To disable PAE auto-selection, later kernels provide 
vm.pmap.pae_mode=0 loader tunable.

> 
> >     the first one is kept enabled for now to evaluate the status
> >     on HEAD, ...
> > 
> > Well, here's a report that indicates the status is "not okay".  The
> > commit message also has the afterthought:
> > 
> >     Also, VM_KMEM_SIZE_SCALE changed from 3 to 1.
> > 
> > Okay, so what does that mean.  Will setting vm.kmem_size_scale to 3
> > fix what appears to be some memory corruption or mismanagement?
> > 
> 
> -- 
> Rod Grimes                                                 rgrimes_at_freebsd.org
Received on Sat Feb 23 2019 - 17:24:09 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:20 UTC