Re: Potential source of interrupt aliasing

From: Doug White <dwhite_at_gumbysoft.com>
Date: Sun, 10 Apr 2005 18:05:44 -0700 (PDT)
On Sun, 10 Apr 2005, Matthew Dillon wrote:

>     A couple of things don't click here.
>
>     First, unless this 'boot interrupt' IRQ is pointing to an APIC vector
>     that is initialized to point at the softclock there is no way the
>     softclock ithread could be involved.  I'm not saying that it isn't
>     running away, just that the boot interrupt business is probably not
>     the cause.  This boot interrupt thingy kinda sounds like a red herring.

softclock is the poor innocent bystander. Any ithread would do. As long as
its something that prevents other ithreads from being scheduled.

Thats still an experiment in progress, though, so don't get too hung up on
it.

>     Secondly, you HAVE to mask the APIC vector in the interrupt service
>     routine if the service routine is going to schedule an ithread.  There's
>     no choice... it HAS to be done because the ISR isn't capable of clearing
>     the originating interrupt from the device... the interrupt thread has to
>     do that.

Or acknowledge the interrupt in the hardware before scheduling the
ithread via a routine provided by the driver.

>     *BUT* it *IS* possible that the wrong APIC vector is being masked (and
>     not because of an interrupt alias, but because the actual hard interrupt
>     is misrouted).

I don't think this is the case. Somehow the vector would have to get
corrupted during this function call, which is line 609 in
src/sys/i386/i386/local_apic.c:

isrc = intr_lookup_source(apic_idt_to_irq(frame.if_vec));

which reduces to an array lookup with an offset index.

apic_idt_to_irq(), with the asserts and range checks removed, is:

return (vector - APIC_IO_INTS);

And intr_lookup_source is:

return (interrupt_sources[vector]);

I would expect much wider aliasing or stray interrupt problems if this was
occuring.

>     I've seen this occur numerous times.  What happens is
>     that a device generates an mis-routed interrupt which causes the
>     interrupt handler for an UNRELATED device to run.  It runs to completion
>     but since the device it thought interrupted was not the device that
>     actually interrupted, the interrupt on the actual originating device
>     never gets cleared so the moment the ithread completes and unmasks that
>     APIC vector, the APIC issues another interrupt.  The result is that the
>     ithread is constantly running.
>
>     Misrouted interrupts are a serious problem.  They seem to be caused by
>     the BIOS or ACPI getting confused about how bridges are wired... when
>     multiple devices route an interrupt through the same pin on a bridge
>     and one is routed, the BIOS or ACPI gets seriously confused about
>     the second device and may believe that the second device can be routed
>     to a different IRQ when, in fact, it can't.  You wind up with one of
>     the two devices on the wrong IRQ.  This problem is exasperated when
>     the BIOS routes some of the devices for use by the BIOS (such as for
>     PXE booting), or to handle a USB keyboard, or something of that sort.

I'm convinced these "misrouted interrupts" are sourcing from the boot
interrupt functionality.  You don't route interrupts in APIC mode; its a
flat space. All of the APIC entries stack together as if they were one
gigantic IOAPIC that every PCI device's INTx lines were attached to. This
is the System Interrupts model described in the ACPI specification.

-- 
Doug White                    |  FreeBSD: The Power to Serve
dwhite_at_gumbysoft.com          |  www.FreeBSD.org
Received on Sun Apr 10 2005 - 23:05:44 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:31 UTC