RE: New interrupt stuff breaks ASUS 2 CPU system

From: John Baldwin <jhb_at_FreeBSD.org>
Date: Wed, 05 Nov 2003 17:43:18 -0500 (EST)
On 05-Nov-2003 Harti Brandt wrote:
> On Tue, 4 Nov 2003, John Baldwin wrote:
> 
> JB>
> JB>On 04-Nov-2003 Harti Brandt wrote:
> JB>> On Tue, 4 Nov 2003, Harti Brandt wrote:
> JB>>
> JB>> HB>On Tue, 4 Nov 2003, John Baldwin wrote:
> JB>> HB>
> JB>> HB>JB>
> JB>> HB>JB>On 04-Nov-2003 Harti Brandt wrote:
> JB>> HB>JB>>
> JB>> HB>JB>> Hi,
> JB>> HB>JB>>
> JB>> HB>JB>> I have an ASUS system with 2 CPUs that I need to run at HZ=10000. This
> JB>> HB>JB>> worked until yesterday, but with the new interrupt code it doesn't boot
> JB>> HB>JB>> anymore. It works for the standard HZ, but if I set HZ=1000 I get a double
> JB>> HB>JB>> fault. I suspect a race condition in the interrupt handling. My config
> JB>> HB>JB>> file has
> JB>> HB>JB>>
> JB>> HB>JB>> options SMP
> JB>> HB>JB>> device apic
> JB>> HB>JB>> options HZ=1000
> JB>> HB>JB>
> JB>> HB>JB>Ok, I can try to reproduce.
> JB>> HB>JB>
> JB>> HB>JB>> Device configuration finished.
> JB>> HB>JB>> Timecounter "TSC" frequency 1380009492 Hz quality -100
> JB>> HB>JB>> Timecounters cpuid = 0; apic id = 00
> JB>> HB>JB>> instruction pointer   = 0x8:0xc048995d
> JB>> HB>JB>> stack pointer         = 0x10:0xc0821bf4
> JB>> HB>JB>> frame pointer        cpuid = 0; apic id = 00
> JB>> HB>JB>>
> JB>> HB>JB>> 0xc048995d is in critical_exit. It is the jmp after the popf from
> JB>> HB>JB>> cpu_critical_exit.
> JB>> HB>JB>
> JB>> HB>JB>This is where interrupts are re-enabled, so you are getting an interrupt.
> JB>> HB>JB>It might be helpful to figure what type of fault you are actually getting.
> JB>> HB>
> JB>> HB>tf_err is 0, tf_trapno is 30 (decimal).
> JB>>
> JB>> More information:
> JB>>
> JB>> I have replaced all the reserved vectors with individual ones, that set
> JB>> tf_err to the index (vector number). It appears the the vector number is
> JB>> 39 decimal. What does that mean?
> JB>
> JB>IRQ 7.
> JB>Can you post a verbose dmesg?  Also, can you try both with and without
> JB>ACPI?
> 
> Attached are both dmesgs.
> 
> More datapoints:
> 
> I had the parallel port (irq7) and the second sio disabled in the BIOS.
> After enabling both I now get a panic in lapic_handle_intr: Couldn't get
> vector from ISR! After fetching the relevant docs from intel I checked the
> registers of the apic pointed to by lapic. The interrupt taken is
> Xapic_irq1. isr1 is zero, but irr1 is 0x100 (that was without ACPI). How
> may that happen? As I understand ISR are the interrupts that have been
> delivered to the CPU so if it is interrupted a bit should be set, correct?

I figured out what is happenning I think.  You are getting a spurious
interrupt from the 8259A PIC (which comes in on IRQ 7).  The IRR register
lists pending interrupts still waiting to be serviced.  Try using
'options NO_MIXED_MODE' to stop using the 8259A's for the clock and see if
the spurious IRQ 7 interrupts go away.

> A question while reading the code: what does the global lapic variable
> refer to? As I understand every CPU has its local APIC. Does it point to
> one of those two? To which?

Every CPU can get to its APIC at the same physical address.  Thus, CPU A
can only get to its own local APIC, and not to any other CPUs.  The 'lapic'
variable has a virtual address mapped to the physical address of the local
APIC.

-- 

John Baldwin <jhb_at_FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve!"  -  http://www.FreeBSD.org/
Received on Wed Nov 05 2003 - 13:43:45 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:27 UTC