Re: PCengines APU2C4, 12-STABLE: bootloader failure: Panic: free: guard2 fail @ 0x1000 + 2311663946 from

From: Toomas Soome <tsoome_at_me.com>
Date: Tue, 30 Jul 2019 17:01:57 +0300
> On 30 Jul 2019, at 15:43, O. Hartmann <o.hartmann_at_walstatt.org> wrote:
> 
> On Wed, 24 Jul 2019 18:07:22 +0300
> Toomas Soome <tsoome_at_me.com> wrote:
> 
>>> On 24 Jul 2019, at 16:48, O. Hartmann <ohartmann_at_walstatt.org> wrote:
>>> 
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA256
>>> 
>>> Am Wed, 24 Jul 2019 12:06:53 +0200
>>> "O. Hartmann" <o.hartmann_at_walstatt.org <mailto:o.hartmann_at_walstatt.org>>
>>> schrieb: 
>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>> Hash: SHA256
>>>> 
>>>> Am Wed, 24 Jul 2019 12:09:16 +0300
>>>> Toomas Soome <tsoome_at_me.com> schrieb:
>>>> 
>>>>>> On 24 Jul 2019, at 11:11, O. Hartmann <ohartmann_at_walstatt.org> wrote:
>>>>>> 
>>>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>>>> Hash: SHA256
>>>>>> 
>>>>>> Hallo,
>>>>>> 
>>>>>> on APU2C4 from PCengines with latest firmware apu2_v4.9.0.7.rom, SeaBIOS
>>>>>> rel-1.12.1.3-0-g300e8b7, booting via legacy MBR FreeBSD 12-STABLE
>>>>>> r350274 (the same with r350115) fails to boot with an immediate loader
>>>>>> error:
>>>>>> 
>>>>>> [...]
>>>>>> SeaBIOS (version rel-1.12.1.3-0-g300e8b7)
>>>>>> 
>>>>>> Press F10 key now for boot menu
>>>>>> 
>>>>>> Booting from Hard Disk...
>>>>>> /
>>>>>> 
>>>>>> onsoles: internal video/keyboard   
>>>>>> IOS drive C: is disk0 
>>>>>> IOS drive D: is disk1 
>>>>>> IOS 639kB/3404444kB available memory 
>>>>>> 
>>>>>> reeBSD/x86 bootstrap loader, Revision 1.1  
>>>>>> Mon Apr 15 21:28:11 CEST 2019 root_at_thor) 
>>>>>> anic: free: guard2 fail _at_ 0x1000 + 2311663946 from
>>>>>> Xçu0ç}4çl$♦├í_at_┤♠:2106163957 -> Press a key on the console to reboot
>>>>>> <-- […]      
>>>>> 
>>>>> 
>>>>> This is definitely something “funny”, we are apparently attempting to
>>>>> free pointer 0x1000 which is definitely wrong because our heap should be
>>>>> just below 4GB line. Since we do get list of disks printed, also memory
>>>>> and version, it means we get error from interpretator - it is possible
>>>>> the stack did clash with bss and hence the corruption.    
>>>> 
>>>> I realized that I have defined 
>>>> 
>>>> WITH_KERNEL_RETPOLINE=YES
>>>> 
>>>> and since I use to build NanoBSD with -DNO_CLEAN, I'm just now compiling a
>>>> clean NanoBSD with RETPOLINE mitigations disabled so far - trying to check
>>>> whether either of the ways to build causes the issue.
>>>> 
>>>>> 
>>>>> You can try to press space on first spinner and enter alternate loader on
>>>>> boot: prompt. (enter ?/boot on boot: prompt to see the file list).    
>>>> 
>>>> I try a soon as the build process has finished and if the problem is then
>>>> still present.  
>>> 
>>> 
>>> With a fresh build and no RETPOLINE mitigation (neither kernel nor world)
>>> the phenomenon as described above is still the same. I tried an alternative
>>> loader as requested, but without success. When choosing loader_4th, I get
>>> this error:
>>> 
>>> [...]
>>> FreeBSD/x86 boot
>>> Default: 0:ad(0p3)/boot/loader
>>> boot:  /boot/loader_4th/
>>> 
>>> onsoles: internal video/keyboard
>>> IOS drive C: is disk0
>>> IOS drive D: is disk1
>>> IOS 639kB/3404444kB available memory
>>> 
>>> reeBSD/x86 bootstrap loader, Revision 1.1
>>> Wed Jul 24 12:51:12 CEST 2019 root_at_thor)
>>> anic: No heap setup  
>>> -> Press a key on the console to reboot <—  
>>> 
>> 
>> Now this is bad. if my math is correct, this system is supposed to have 3GB
>> of RAM, so are there specific build exceptions in place? see
>> stand/i386/loader/main.c, function main, after call to bios_getmem().
>> 
>> rgds,
>> toomas
> 
> 
> Hello Toomas,
> the PCengine APU2C4 is supposed to have 4GB of RAM - wouldn't have a 64bit
> system seen the whole range? On 32bit systems, there was a memory hole I assume
> for memory mapped  I/O of several PCI devices. This is the first time I have a
> look at the memory reported by the kernel and its confusing me a bit.
> 

The BIOS loader is only running in 32-bit protected mode, we switch to 64-bit when we start the kernel. With UEFI we do have 32- and 64-bit loaders, depending on firmware implementation (because we need to use firmware provided functions), but even there there are buggy systems and we keep memory usage below 4G line.

> I do not have any kind of specific build exceptions in place targetting the
> memory. Of course, for memory usage and image size optimizations I defined
> several WITHOUT_ and WITH_ tags for build and install - but they never caused
> any trouble and have not been changed so far.

Unfortunately the only way to identify what is the cause, is to start inserting debug printf’s into the code paths and to see where we get blown up. There can be several reasons, and the most common case still is plain and simple buffer overruns… debugging this is time consuming job.

rgds,
toomas


> 
>> 
>>> 
>>> Loader loader_simp ends up in stuck console with no output:
>>> 
>>> [...]
>>> FreeBSD/x86 boot
>>> Default: 0:ad(0p3)/boot/loader
>>> boot:  /boot/loader_4th/
>>> 
>>> onsoles: internal video/keyboard
>>> IOS drive C: is disk0
>>> IOS drive D: is disk1
>>> IOS 639kB/3404444kB available memory
>>> 
>>> reeBSD/x86 bootstrap loader, Revision 1.1
>>> Wed Jul 24 12:59:23 CEST 2019 root_at_thor)
>>> [...]
>>> 
>>> regards
>>> oh  
>>>> 
>>>>> 
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Booting 12.0-STABLE #78 r349288: Sat Jun 22 09:10:25 CEST 2019 amd64
>>>>>> works fine with nothing changed except the OS version.
>>>>>> 
>>>>>> 
>>>>>> Booting 2.0-STABLE #78 r349288 works fine:
>>>>>> 
>>>>>> [...]
>>>>>> SeaBIOS (version rel-1.12.1.3-0-g300e8b7)
>>>>>> 
>>>>>> Press F10 key now for boot menu
>>>>>> 
>>>>>> Booting from Hard Disk...
>>>>>> |
>>>>>> 
>>>>>> onsoles: internal video/keyboard   
>>>>>> IOS drive C: is disk0 
>>>>>> IOS drive D: is disk1 
>>>>>> IOS 639kB/3404444kB available memory 
>>>>>> 
>>>>>> reeBSD/x86 bootstrap loader, Revision 1.1 
>>>>>> Mon Apr 15 21:28:11 CEST 2019 root_at_thor) 
>>>>>> oading /boot/defaults/loader.conf 
>>>>>> oading /boot/device.hints 
>>>>>> oading /boot/loader.conf 
>>>>>> oading /boot/loader.conf.local 
>>>>>> Loading kernel...
>>>>>> /boot/kernel/kernel text=0xb005e8 \
>>>>>> [...]
>>>>>> 
>>>>>> In the message taken from the serial console the first column of
>>>>>> characters is lost due to an error in the output which seems FreeBSD
>>>>>> related. 
>>>>> 
>>>>> It certainly does look weird - sio_putc() is used in boot2 and it’s
>>>>> implementation is using same principe as comc_putchat() in comconsole.c
>>>>> (even if it is asm versus c code). Since the serial data is interpreted
>>>>> by terminal, it feels more about terminal emulator issue (line
>>>>> discipline, cabling, usb to serial dongle?)    
>>>> 
>>>> We use here a null modem cabling with an integrated USB-to-UART/TTL
>>>> converter, which is attached to a FreeBSD CURRENT (most recent) box:
>>>> 
>>>> [...]
>>>> ugen2.3: <FTDI FT232R USB UART> at usbus2
>>>> uftdi0 on uhub4
>>>> uftdi0: <FT232R USB UART> on usbus2
>>>> [...]
>>>> 
>>>> it is a 
>>>> StarTech.com 1 Port USB Nullmodem RS232 Adapter Kabel (USB 2.0 FTDI
>>>> chipset).
>>>> 
>>>> Regards,
>>>> oh
>>>> 
>>>> 
>>>>> 
>>>>> rgds,
>>>>> toomas
>>>>> 
>>>>> 
>>>>>> 
>>>>>> The file /boot/loader.conf.local contains these lines in both, working
>>>>>> and non-working, scenario:
>>>>>> 
>>>>>> [...]
>>>>>> boot_serial="YES"
>>>>>> # serial speed in bits/s
>>>>>> comconsole_speed="115200"
>>>>>> console="comconsole"
>>>>>> 
>>>>>> autoboot_delay="0"
>>>>>> 
>>>>>> verbose_loading="YES"
>>>>>> loader_logo="orb"
>>>>>> beastie_disable="YES"
>>>>>> 
>>>>>> ###  Microcode
>>>>>> #cpu_microcode_load="YES"                # Set this to YES to load and
>>>>>> apply a #cpu_microcode_name="/boot/firmware/intel-ucode.bin" # Set this
>>>>>> to the microcode #cpu_microcode_type="cpu_microcode"      # Required for
>>>>>> the kernel to find # the microcode update file.
>>>>>> 
>>>>>> 
>>>>>> # disable Process Table Isolation
>>>>>> #vm.pmap.pti=0
>>>>>> 
>>>>>> kern.geom.label.gptid.enable=0
>>>>>> 
>>>>>> # Limit the phys. memory
>>>>>> #hw.physmem=1073741824  # 1 G
>>>>>> #hw.physmem=536870912   # 512 MB
>>>>>> #hw.physmem=268435456   # 256 MB
>>>>>> 
>>>>>> # Da mehr als 1 igb NIC an Bord! Siehe man igb(4)
>>>>>> kern.ipc.nmbclusters=757350
>>>>>> #kern.ipc.nmbjumbo9k=8192
>>>>>> 
>>>>>> # NIC
>>>>>> #hw.em.max_interrupt_rate=32000
>>>>>> hw.em.max_interrupt_rate=16000
>>>>>> 
>>>>>> #If non-zero, enable EXPERIMENTAL feature to improve concurrent Fortuna
>>>>>> performance kern.random.fortuna.concurrent_read="1"
>>>>>> 
>>>>>> # Forward Information Bases (FIBs)
>>>>>> net.fibs=10
>>>>>> net.add_addr_allfibs=0
>>>>>> 
>>>>>> [...]
>>>>>> 
>>>>>> 
>>>>>> Again, with the exact same setting 12-STABLE r349288 boots fine,
>>>>>> rr350274 doesn't. FreeBSD 12-STABLE r
>>>>>> 
>>>>>> Can someone please help?
>>>>>> 
>>>>>> Thanks in advance, oh  
> [...]
> _______________________________________________
> freebsd-current_at_freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"
Received on Tue Jul 30 2019 - 12:11:52 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:21 UTC