Re: 7-CURRENT-SNAP009-i386-bootonly.iso on Shuttle XPC w/ AMD X2 (was Re: Side note on Shuttle XPC)

From: Scott Long <scottl_at_samsco.org>
Date: Sun, 20 Nov 2005 09:47:46 -0700
Matthew Dillon wrote:
> :...
> :> a spurious ICU interrupt.  I have part of peter's hack expanded to do a full 
> :> reset of the ICUs, and I'll update it for Monday to adjust the base interrupt 
> :> such that the spurious ICU vectors get sent to the APIC spurious interrupt 
> :> vector.  That should fix your issue as well as the same issue reported by 
> :> someone else on the amd64_at_ list recently.
> :> 
> :
> :Does this imply that the 'correct' fix involves catching the stray ICU 
> :interrupt via a trap handler?  How often do these interrupts happen,
> :and therefore what is the performance consequence to having to handle
> :them?
> :
> :Scott
> 
>     I think John has the right fix in mind.  You have to catch the stray
>     interrupt vector for every interrupt controller in the system.  This
>     means the 8259 stray vector AND the LAPIC stray vector, even if one or
>     both devices is completely disabled.
> 
>     Whether this represents a performance problem depends on the situation.
>     If any interrupts are routed through the 8259 at all then the BIOS
>     misprogramming bug I mentioned earlier will result in each real 
>     interrupt also causing a stray interrupt (due to the double INT A cycle).
>     Clearly this is not desirable.  If the 8259 is 100% disabled I think
>     the duplicate stray interrupts will go away.
> 
>     Even under perfect conditions a stray interrupt can occur during
>     programming or reprogramming of the 8259.  This would not cause a
>     performance issue, just result in an occassional stray.  For example,
>     if the 8259 issues an IRQ to the cpu and the IRQ source is masked
>     while the cpu is doing an INT A cycle, the 8259 will return the stray
>     interrupt vector.
> 
>     With regards to the LAPICs the story is slightly better.  The stray
>     interrupt vector can be programmed into the LAPIC and the interrupt
>     service routine basically doesn't have to do a thing, not even EOI.
>     A stray LAPIC interrupt can occur in a number of situations but I
>     do not believe any of them would result in the same braindamage that
>     you get from broken 8259 routing.   One example of stray generation here
>     would be if you changed the TPR while the LAPIC is responding to the
>     cpu's INT A cycle.
> 
>     One thing this does imply is that we should never, ever overlap the
>     8259 interrupt vector space with the LAPIC vector space.  I wonder if
>     the LAPIC EOI lockup issue might be explained by an 8259 returning its
>     stray vector that is misinterpreted as an LAPIC interrupt.  Since there
>     is no way to determine what IRQ an LAPIC EOI is actually servicing
>     (except by checking the ISR to see what bit actually got cleared), any
>     sort of misinterpretation will result in disaster.  That means I have
>     some work to do in DragonFly which is still using the separate FAST/SLOW
>     vector code with the LAPIC 'SLOW' interrupts overlapping the 8259
>     vector space.
> 
>     The 8259's stray interrupt vector is BASE+7 (usually 0x20 + 7).  I
>     suspect that BASE+15 might also occur sometimes.  The only way to
>     completely avoid getting stray 8259 vectors would be to *NEVER* mess
>     with the interrupt masks.  I don't think that CLI/STI would work here,
>     the INT A cycle is almost guarenteed to be decoupled from the 
>     instruction stream.  In fact, at least on the AMD, the hypertransport 
>     layer will do the cycle and queue a pending vector until it can be
>     delivered to the cpu (from my read).
> 
>     That is clearly a problem since we pretty much have to mess with the
>     masks to deal with level interrupt sources.  Or to disable the 8259
>     completely, which is the solution John mentioned to me.
> 
> 					-Matt
> 					Matthew Dillon 
> 					<dillon_at_backplane.com>

It turns out that the T_RESERVED trap only gets hit once, when the 
second CPU is being started.  Looks like an easy fix.

Scott
Received on Sun Nov 20 2005 - 15:47:56 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:48 UTC