Recent versions of pxeboot hang/panic on AMD platform.

From: Luigi Rizzo <rizzo_at_iet.unipi.it>
Date: Sat, 22 Nov 2008 00:14:00 +0100
[copying some people involved with recent related commits]

As reported in  kern/118222 recent versions of pxeboot hang/panic
on AMD platform.

Initial reports mentioned that the RELENG_6 versions worked well,
however i found out that even the recent RELENG_6 code is problematic.

Specifically, the problem i see on two machines with AMD CPU (one
is an Asus M2N-VM) motherboard netbooting with PXEboot, is that the
loading of config files or binary modules (kernel, etc.) randomly
hangs with recent version of pxeboot (RELENG_6, RELENG_7 and HEAD
all give the same behaviour).

The same system works fine with an old version of pxeboot from RELENG_6.

Things seem to work fine on i386 (tried a Pentium4, N270 and on qemu) 
with all the versions below.

To make some investigation i started with a reliable version
(RELENG_6, early 2008) and moved forward to figure out where the
problem was introduced. I found the following:

        RELENG_6 as of 2008.03.01 (svn 176674)  works
        RELENG_6 as of 2008.03.15 (svn 177190)  works
                (same as previous)
        RELENG_6 as of 2008.03.31 (svn 177768)  does NOT work
            changed files:
                Index: RELENG_6/sys/boot/i386/boot2/boot2.c
                Index: RELENG_6/sys/boot/i386/btx/btx/Makefile
                Index: RELENG_6/sys/boot/i386/btx/btx/btx.S
                Index: RELENG_6/sys/boot/i386/gptboot/gptboot.c
                Index: RELENG_6/sys/boot/i386/libi386/biossmap.c
                Index: RELENG_6/sys/boot/i386/libi386/biosmem.c

There is a recent, related change (august 2008) which however
does not seem to fix the bug.

(all the above is basically an MFC of something applied slightly earlier to
head and RELENG_7 . I have experienced the same exact bug with a fresh
head and RELENG_7, even though I have not found the exact point there
where the problem arised).

The fact that the failure occurs at random times, even quite early 
(e.g. while reading the Forth config files) suggests that the problem
may be related to interrupts coming at the wrong time. 

Unfortunately the changes to btx.S (which i believe may be related to
the problem, as the changes to the other files seem innocuous or unrelated)
are beyond my knowledge. 

So, anyone has ideas on what could be happening here, and especially
how likely it is that we might see the same problem with a disk or usb-based
booting ?

	cheers
	luigi

be the case to back out this
Received on Fri Nov 21 2008 - 22:09:29 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:37 UTC