Re: 7.0-CURRENT Hang

From: Yar Tikhiy <yar_at_comp.chem.msu.su>
Date: Tue, 7 Feb 2006 21:27:55 +0300
On Tue, Feb 07, 2006 at 10:06:32AM -0800, Cy Schubert wrote:
> In message <20060207173154.GE19674_at_comp.chem.msu.su>, Yar Tikhiy writes:
> > On Mon, Feb 06, 2006 at 08:29:35PM -0800, Cy Schubert wrote:
> > > 
> > > On the Pentium P54C model (that's an old 120 MHz Pentium I use as a 4.x, 
> > > 5.x, and 7.x ports build testbed) the CPUID instruction when called with AL
> >  
> > > = 0x02, CPUID returns EAX = EBX = ECX = EDX = 0. The code fragment in 
> > > identcpu.c below results in "rounds" becoming 0xffffffff.
> > > 
> > > 	do_cpuid(0x2, regs);
> > > 	rounds = (regs[0] & 0xff) - 1;
> > > 
> > > The subsequent loop of the following will loop virtually for ever (it takes
> >  
> > > forever tor this machine to count down from 0xffffffff performing a very 
> > > great many calls to get_INTEL_TLB in the process, virtually hanging the 
> > > machine in the process.
> > > 
> > > 	while (rounds > 0) {
> > > 		[... code ...]
> > > 		rounds--;
> > > 	}
> > 
> > FWIW, my presumably P54C machine (Family 5 Model 2 Stepping 6)
> > doesn't indicate it has the CPUID 0x02 function.  That is, CPUID
> > 0x00 returns EAX = 0x01, which is the highest function supported.
> > Could you try to run the misc/cpuid port on your Pentium and show
> > its output?  It might appear that the code around CPUID 0x02 shouldn't
> > be reached at all in your case.  Zero values from CPUID 0x02 are
> > pretty indicative of that.
> 
> Mine is Family 5 Model 2 Stepping 12. All of my doc is for Pentium-Pro and 
> newer so you are probably correct.

Do you know what CPUID function 0x00 returns in EAX for your CPU?
Hint: just run misc/cpuid once and show its output here.  I've just
fixed the port so that it has no bogus dependencies and is very
light-weight.

> > Dealing with "rounds" equal to -1 can be a good idea anyway to catch
> > braid dead CPUs instead of hanging the system on them.
> 
> Well, with rounds = -1 [actually (unsigned int)0xffffffff], the CPU will 
> "appear" to hang as it "rounds" or loops virtually forever -- counting back 
> from 0xffffffff on a 120 MHz machine and performing get TLB info a number 
> of times each iteration takes hours to do just a few iterations. I've seen 
> mine go through "rounds", decrementing rounds-- each time, for hours at a 
> time, though initially before digging into it using DDB it did appear that 
> the CPU was hung, it was just starting to loop for 4,294,967,295 times. On 
> older and slower machines, if it took hours to iterate through a few 
> iterations, my guess is that it would take days to loop through this code. 
> My patch allows it to take the defaults and finally boot. If the CPU 
> doesn't support AL = 0x02, what's the point of looping? It appears to run 
> nicely with the patch.

I do see that rounds = -1 is causing trouble.
I just meant that we should not call do_cpuid(0x02) at all if
(cpu_high < 2) because it can result in undefined behavior.
Your patch still makes sense because it deals with possible
brain-dead CPUs.  I'd implement it in a slightly different
way though -- stay tuned! :-)

-- 
Yar
Received on Tue Feb 07 2006 - 17:28:07 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:52 UTC