[SOLVED] Re: Recent versions of pxeboot hang/panic on AMD platform.

From: Lawrence Stewart <lstewart_at_freebsd.org>
Date: Wed, 25 Feb 2009 17:45:45 +1100
Lawrence Stewart wrote:
> Luigi Rizzo wrote:
>> [copying some people involved with recent related commits]
>>
>> As reported in  kern/118222 recent versions of pxeboot hang/panic
>> on AMD platform.
>>
>> Initial reports mentioned that the RELENG_6 versions worked well,
>> however i found out that even the recent RELENG_6 code is problematic.
>>
>> Specifically, the problem i see on two machines with AMD CPU (one
>> is an Asus M2N-VM) motherboard netbooting with PXEboot, is that the
>> loading of config files or binary modules (kernel, etc.) randomly
>> hangs with recent version of pxeboot (RELENG_6, RELENG_7 and HEAD
>> all give the same behaviour).
>>
>> The same system works fine with an old version of pxeboot from RELENG_6.
>>
>> Things seem to work fine on i386 (tried a Pentium4, N270 and on qemu) 
>> with all the versions below.
>>
>> To make some investigation i started with a reliable version
>> (RELENG_6, early 2008) and moved forward to figure out where the
>> problem was introduced. I found the following:
>>
>>         RELENG_6 as of 2008.03.01 (svn 176674)  works
>>         RELENG_6 as of 2008.03.15 (svn 177190)  works
>>                 (same as previous)
>>         RELENG_6 as of 2008.03.31 (svn 177768)  does NOT work
>>             changed files:
>>                 Index: RELENG_6/sys/boot/i386/boot2/boot2.c
>>                 Index: RELENG_6/sys/boot/i386/btx/btx/Makefile
>>                 Index: RELENG_6/sys/boot/i386/btx/btx/btx.S
>>                 Index: RELENG_6/sys/boot/i386/gptboot/gptboot.c
>>                 Index: RELENG_6/sys/boot/i386/libi386/biossmap.c
>>                 Index: RELENG_6/sys/boot/i386/libi386/biosmem.c
>>
>> There is a recent, related change (august 2008) which however
>> does not seem to fix the bug.
>>
>> (all the above is basically an MFC of something applied slightly 
>> earlier to
>> head and RELENG_7 . I have experienced the same exact bug with a fresh
>> head and RELENG_7, even though I have not found the exact point there
>> where the problem arised).
>>
>> The fact that the failure occurs at random times, even quite early 
>> (e.g. while reading the Forth config files) suggests that the problem
>> may be related to interrupts coming at the wrong time.
>> Unfortunately the changes to btx.S (which i believe may be related to
>> the problem, as the changes to the other files seem innocuous or 
>> unrelated)
>> are beyond my knowledge.
>> So, anyone has ideas on what could be happening here, and especially
>> how likely it is that we might see the same problem with a disk or 
>> usb-based
>> booting ?
> 
> Just adding a "me too" with pxeboot built from head r188509. Running 
> with pxeboot from AMD64 6.3-RELEASE as Luigi's research hinted seems to 
> resolve the issue for me also. I haven't tried pxeboot built from 
> r177768 yet though to see if it too fails.
> 
> To quickly touch on symptoms... I've never seen a panic. I experience 
> permanent hangs that occur maybe 50% (or possibly even more) of the time 
> when I reboot or cold start the machine. Only option is to reboot when 
> it hangs. Rebooting a few times will eventually allow the boot process 
> to finish and then once the kernel kicks off probing, all is good.
> 
> Hardware is an Intel 865GM chipset based Gigabyte mainboard with a 3GHz 
> HTT P4 CPU (HTT enabled).
> 
> Happy to help debug further if anyone has ideas to try.
> 

On a whim I decided to try a PCI Intel GigE NIC I had lying around... 
low and behold I can't make the machine hang during boot any more with 
pxeboot built from head r188509.

To be a bit more specific about the hardware involved, the motherboard 
is a Gigabyte GA-8I865GM-775 with BIOS version F5 and the onboard NIC 
shows up as follows:

Marvell Yukon pxe rom version (reported during boot): 1.11

pciconf -lv:

skc0_at_pci0:1:9:0:        class=0x020000 card=0xe0001458 chip=0x432011ab 
rev=0x13 hdr=0x00
     vendor     = 'Marvell Semiconductor (Was: Galileo Technology Ltd)' 

     device     = 'Yukon 88E8001/8003/8010 PCI Gigabit Ethernet 
Controller (Copper)'
     class      = network 

     subclass   = ethernet


Onboard NIC related verbose dmesg output:

skc0: <Marvell Gigabit Ethernet> port 0xa400-0xa4ff mem 
0xf9040000-0xf9043fff irq 20 at device 9.0 on pci1
skc0: Reserved 0x4000 bytes for rid 0x10 type 3 at 0xf9040000 

skc0: interrupt moderation is 100 us 

skc0: Marvell Yukon Lite Gigabit Ethernet rev. (0x9) 

skc0: chip ver  = 0xb1 

skc0: chip rev  = 0x09 

skc0: SK_EPROM0 = 0x10 

skc0: SRAM size = 0x010000 

sk0: <Marvell Semiconductor, Inc. Yukon> on skc0 

sk0: bpf attached 

sk0: Ethernet address: <mac> 

miibus0: <MII bus> on sk0 

e1000phy0: <Marvell 88E1011 Gigabit PHY> PHY 0 on miibus0 

e1000phy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 
1000baseTX-FDX, auto
ioapic0: routing intpin 20 (PCI IRQ 20) to lapic 0 vector 54 

skc0: [MPSAFE] 

skc0: [ITHREAD]


As a follow on from the Intel NIC discovery, I also noticed John's 
commit from yesterday (r189017) which looked promising and took it for a 
spin. I'm happy to report that it appears to resolve the hang with the 
Marvell card's pxe rom. After at least a dozen reboot/cold start 
attempts it hasn't hung once, whereas pxeboot build from r189016 hangs 
most of the time. The addon Intel NIC is still unphased by either 
pxeboot version and boots just fine regardless.

So for me at least, looks like the case is closed. Thanks go to Tor, 
John and Bjoern for their work on r189017.

Cheers,
Lawrence
Received on Wed Feb 25 2009 - 06:17:54 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:42 UTC