RE: New interrupt stuff breaks ASUS 2 CPU system

From: Harti Brandt <brandt_at_fokus.fraunhofer.de>
Date: Thu, 6 Nov 2003 15:29:29 +0100 (CET)
On Wed, 5 Nov 2003, John Baldwin wrote:

JB>
JB>On 05-Nov-2003 Harti Brandt wrote:
JB>> On Tue, 4 Nov 2003, John Baldwin wrote:
JB>>
JB>> JB>
JB>> JB>On 04-Nov-2003 Harti Brandt wrote:
JB>> JB>> On Tue, 4 Nov 2003, Harti Brandt wrote:
JB>> JB>>
JB>> JB>> HB>On Tue, 4 Nov 2003, John Baldwin wrote:
JB>> JB>> HB>
JB>> JB>> HB>JB>
JB>> JB>> HB>JB>On 04-Nov-2003 Harti Brandt wrote:
JB>> JB>> HB>JB>>
JB>> JB>> HB>JB>> Hi,
JB>> JB>> HB>JB>>
JB>> JB>> HB>JB>> I have an ASUS system with 2 CPUs that I need to run at HZ=10000. This
JB>> JB>> HB>JB>> worked until yesterday, but with the new interrupt code it doesn't boot
JB>> JB>> HB>JB>> anymore. It works for the standard HZ, but if I set HZ=1000 I get a double
JB>> JB>> HB>JB>> fault. I suspect a race condition in the interrupt handling. My config
JB>> JB>> HB>JB>> file has
JB>> JB>> HB>JB>>
JB>> JB>> HB>JB>> options SMP
JB>> JB>> HB>JB>> device apic
JB>> JB>> HB>JB>> options HZ=1000
JB>> JB>> HB>JB>
JB>> JB>> HB>JB>Ok, I can try to reproduce.
JB>> JB>> HB>JB>
JB>> JB>> HB>JB>> Device configuration finished.
JB>> JB>> HB>JB>> Timecounter "TSC" frequency 1380009492 Hz quality -100
JB>> JB>> HB>JB>> Timecounters cpuid = 0; apic id = 00
JB>> JB>> HB>JB>> instruction pointer   = 0x8:0xc048995d
JB>> JB>> HB>JB>> stack pointer         = 0x10:0xc0821bf4
JB>> JB>> HB>JB>> frame pointer        cpuid = 0; apic id = 00
JB>> JB>> HB>JB>>
JB>> JB>> HB>JB>> 0xc048995d is in critical_exit. It is the jmp after the popf from
JB>> JB>> HB>JB>> cpu_critical_exit.
JB>> JB>> HB>JB>
JB>> JB>> HB>JB>This is where interrupts are re-enabled, so you are getting an interrupt.
JB>> JB>> HB>JB>It might be helpful to figure what type of fault you are actually getting.
JB>> JB>> HB>
JB>> JB>> HB>tf_err is 0, tf_trapno is 30 (decimal).
JB>> JB>>
JB>> JB>> More information:
JB>> JB>>
JB>> JB>> I have replaced all the reserved vectors with individual ones, that set
JB>> JB>> tf_err to the index (vector number). It appears the the vector number is
JB>> JB>> 39 decimal. What does that mean?
JB>> JB>
JB>> JB>IRQ 7.
JB>> JB>Can you post a verbose dmesg?  Also, can you try both with and without
JB>> JB>ACPI?
JB>>
JB>> Attached are both dmesgs.
JB>>
JB>> More datapoints:
JB>>
JB>> I had the parallel port (irq7) and the second sio disabled in the BIOS.
JB>> After enabling both I now get a panic in lapic_handle_intr: Couldn't get
JB>> vector from ISR! After fetching the relevant docs from intel I checked the
JB>> registers of the apic pointed to by lapic. The interrupt taken is
JB>> Xapic_irq1. isr1 is zero, but irr1 is 0x100 (that was without ACPI). How
JB>> may that happen? As I understand ISR are the interrupts that have been
JB>> delivered to the CPU so if it is interrupted a bit should be set, correct?
JB>
JB>I figured out what is happenning I think.  You are getting a spurious
JB>interrupt from the 8259A PIC (which comes in on IRQ 7).  The IRR register
JB>lists pending interrupts still waiting to be serviced.  Try using
JB>'options NO_MIXED_MODE' to stop using the 8259A's for the clock and see if
JB>the spurious IRQ 7 interrupts go away.

Ok, that seems to help. Interesting although why do these interrupts
happen only with a larger HZ and when the kernel is doing printfs (this
machine has a serial console). I have also not tried to disable SIO2 and
the parallel port.

Thanks,
harti
-- 
harti brandt,
http://www.fokus.fraunhofer.de/research/cc/cats/employees/hartmut.brandt/private
brandt_at_fokus.fraunhofer.de, harti_at_freebsd.org
Received on Thu Nov 06 2003 - 05:29:33 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:27 UTC