Re: APIC-UP related panic

From: John Baldwin <jhb_at_FreeBSD.org>
Date: Tue, 11 Nov 2003 11:35:26 -0500 (EST)
On 11-Nov-2003 Harald Schmalzbauer wrote:
> On Monday 10 November 2003 19:33, John Baldwin wrote:
>> On 08-Nov-2003 Harald Schmalzbauer wrote:
>> > On Thursday 06 November 2003 17:33, John Baldwin wrote:
>> >> On 06-Nov-2003 Harald Schmalzbauer wrote:
>> >> > Hello,
>> >> >
>> >> > I have one reproducable panic with sources from 04. Nov when enabling
>> >> > "device apic" in the kernel.
>> >> > While building OpenOffice about 1 1/2 hours after start the system
>> >> > reboots. This is absolutely reproducable. Removing device apic from
>> >> > the kernel solves the problem!
> *SNIP*
>> >> Can you try the patch at
>> >> http://www.FreeBSD.org/~jhb/patches/spurious.patch
>> >
>> > Regrettably this hasn't helped. The machine crashed aigain when building
>> > OpenOffice. This time I have something different in messages:
>> > Nov  7 19:51:27 cale syslogd: kernel boot file is /boot/kernel/kernel
>> > Nov  7 19:51:27 cale kernel: panic: Couldn't get vector from ISR!
>> > Nov  7 19:51:27 cale kernel:
>> > Nov  7 19:51:27 cale kernel: syncing disks, buffers remaining... 2202
>> > 2202 2202 2202 2202 2202 2202
>> > 2202 2202 2202 2202 2202 2202 2202 2202 2202 2202 2202 2202 2202
>> > Nov  7 19:51:27 cale kernel: giving up on 1109 buffers
>> > Nov  7 19:51:27 cale kernel: Uptime: 3h57m51s
>> > Nov  7 19:51:27 cale kernel: Shutting down ACPI
>> > Nov  7 19:51:27 cale kernel: Automatic reboot in 15 seconds - press a key
>> > on the console to abort
>> > Nov  7 19:51:27 cale kernel: Rebooting...
>> >
>> > Let me know if I can help. Should I build a debug-kernel? I think that
>> > doesn't help too much since the machine rebootos immediately, so I have
>> > no chance to type anything like trace.
>>
>> Ok.  The problem is that when the spurious interrupt is triggered, it
>> doesn't set a bit in the ISR.  Hmm, can you try using 'options
>> NO_MIXED_MODE' instead?
> 
> Uhm, I don't really understand what's going on. Also I haven't found anything 
> about NO_MIXED_MODE but I made the usual kernel (-current from Nov.09, 
> without the spurious patch) with "device apic" and "options NO_MIXED_MODE".
> Now quake2forge compiled successfully (which also crashed the machine with the 
> last apic kernel) also OpenOffice compiles fine.
> I see one difference in dmesg:
> Timecounter shows now "ACPI-fast" like with a previous SMP-kernel instead of 
> "ACPI-safe" like wth the UP kernel. Just for info attached the new dmesg.
> 
> 
> Do you have any enlightning link for me about apic and NO_MIXED_MODE?

It's documented in /sys/i386/conf/NOTES now along with 'device apic'.  For
a longer explanation of what is happening:

The 8259A PICs can generate a spurious interrupt cycle when (I believe)
an ISA interrupt is deasserted after the PIC begins the interrupt cycle.
Or something like that.  It's a weird race condition in the hardware.
Anyways, as a result, when the 8259A master PIC notices this, it raises
IRQ 7 but doesn't mark it as active/pending in its registers.

Now, in the APIC world, things are a bit different.  Spurious interrupts
have their own separate vector that we handle just fine.  However, on
some motherboards, IRQ 0 (from the ISA timer) is not connected to the
I/O APIC that routes ISA interrupts.  To work around this, we have to
use something called "mixed mode".  On the I/O APIC that routes ISA
interrupts, the first interrupt pin (0) is special in that it doesn't
have an ISA interrupt hooked up to it.  Instead, the 8259A Master PIC
is hooked up to that pin, and when it generates an Interrupt, the I/O
APIC will pass it along transparently to the CPU if that pin on the
I/O APIC is enabled.  This pin is known as an ExtINT pin.

Mixed mode is actually how your SMP machine works with a kernel
that doesn't include 'device apic'.  The BIOS programs the APICs
to use mixed mode in one of two ways (except for some really old SMP
motherboards which actually have a hardware switch you have to toggle
to enable the APICs):  It either enables an ExtINT pin directly on
the first CPU that the 82559As talk to, or it enables the ExtINT pin
in the first I/O APIC and routes that interrupt pin to the first CPU.
When a mixed mode interrupt arrives at the CPU, it is does not go
through the local APIC scheduling hardware (per se), but is delivered
directly to the CPU.  Thus, if the CPU tries to ask the local APIC
which interrupt it just received, the local APIC can't tell it which
one arrived.  (The local APIC tells the CPU via its ISR registers,
hence the reason for the panic message mentioning the ISR.)

So, by default, to work around motherboards that don't hook IRQ 0
up to the I/O APIC, we route IRQ 0 through the the 8259A PICs.  IRQ 0
then uses a different low-level interrupt handler that sends its EOI
to the 8259A instead of to the local APIC.  We only use a special
handler for IRQ 0 though.  We assume that IRQ 7 will be routed through
the I/O APIC with all the other ISA interrupts.

What is happening is that the 8259A PIC is sending a spurious interrupt.
This interrupt goes through the ExtINT pin as an IRQ 7 and arrives at
the CPU.  However, the CPU expects IRQ 7 to come from the I/O APIC
via intpin 7, not from the 8259A.  What I think I will try to do for a
workaround is to check if the 8259A generated a spurious vector (there
is some way to check apparently) and just ignore the interrupt if we
couldn't find a vector in the ISR and the 8259A did generate a spurious
vector.
 
However, if NO_MIXED_MODE works, that is actually the more desirable
way to run your system.

-- 

John Baldwin <jhb_at_FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/
Received on Tue Nov 11 2003 - 07:35:59 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:28 UTC