RE: New interrupt stuff breaks ASUS 2 CPU system

From: Harti Brandt <brandt_at_fokus.fraunhofer.de>
Date: Wed, 5 Nov 2003 11:23:23 +0100 (CET)
On Tue, 4 Nov 2003, John Baldwin wrote:

JB>
JB>On 04-Nov-2003 Harti Brandt wrote:
JB>> On Tue, 4 Nov 2003, Harti Brandt wrote:
JB>>
JB>> HB>On Tue, 4 Nov 2003, John Baldwin wrote:
JB>> HB>
JB>> HB>JB>
JB>> HB>JB>On 04-Nov-2003 Harti Brandt wrote:
JB>> HB>JB>>
JB>> HB>JB>> Hi,
JB>> HB>JB>>
JB>> HB>JB>> I have an ASUS system with 2 CPUs that I need to run at HZ=10000. This
JB>> HB>JB>> worked until yesterday, but with the new interrupt code it doesn't boot
JB>> HB>JB>> anymore. It works for the standard HZ, but if I set HZ=1000 I get a double
JB>> HB>JB>> fault. I suspect a race condition in the interrupt handling. My config
JB>> HB>JB>> file has
JB>> HB>JB>>
JB>> HB>JB>> options SMP
JB>> HB>JB>> device apic
JB>> HB>JB>> options HZ=1000
JB>> HB>JB>
JB>> HB>JB>Ok, I can try to reproduce.
JB>> HB>JB>
JB>> HB>JB>> Device configuration finished.
JB>> HB>JB>> Timecounter "TSC" frequency 1380009492 Hz quality -100
JB>> HB>JB>> Timecounters cpuid = 0; apic id = 00
JB>> HB>JB>> instruction pointer   = 0x8:0xc048995d
JB>> HB>JB>> stack pointer         = 0x10:0xc0821bf4
JB>> HB>JB>> frame pointer        cpuid = 0; apic id = 00
JB>> HB>JB>>
JB>> HB>JB>> 0xc048995d is in critical_exit. It is the jmp after the popf from
JB>> HB>JB>> cpu_critical_exit.
JB>> HB>JB>
JB>> HB>JB>This is where interrupts are re-enabled, so you are getting an interrupt.
JB>> HB>JB>It might be helpful to figure what type of fault you are actually getting.
JB>> HB>
JB>> HB>tf_err is 0, tf_trapno is 30 (decimal).
JB>>
JB>> More information:
JB>>
JB>> I have replaced all the reserved vectors with individual ones, that set
JB>> tf_err to the index (vector number). It appears the the vector number is
JB>> 39 decimal. What does that mean?
JB>
JB>IRQ 7.
JB>Can you post a verbose dmesg?  Also, can you try both with and without
JB>ACPI?

Attached are both dmesgs.

More datapoints:

I had the parallel port (irq7) and the second sio disabled in the BIOS.
After enabling both I now get a panic in lapic_handle_intr: Couldn't get
vector from ISR! After fetching the relevant docs from intel I checked the
registers of the apic pointed to by lapic. The interrupt taken is
Xapic_irq1. isr1 is zero, but irr1 is 0x100 (that was without ACPI). How
may that happen? As I understand ISR are the interrupts that have been
delivered to the CPU so if it is interrupted a bit should be set, correct?

I then have replaced the panic by a printf() followed by a return. Now the
system comes to live, but I get a couple of these warnings. When the
system is idle everyting seems fine, but when I start my simulation
application (which normally generates between 20k and 250k interrupts/sec
depending on the MPSAFE setting of the ATM drivers) I get approx 1-2 of
these messages per second (this is with HZ=1000).

A question while reading the code: what does the global lapic variable
refer to? As I understand every CPU has its local APIC. Does it point to
one of those two? To which?

Regards,
harti
-- 
harti brandt,
http://www.fokus.fraunhofer.de/research/cc/cats/employees/hartmut.brandt/private
brandt_at_fokus.fraunhofer.de, harti_at_freebsd.org
Received on Wed Nov 05 2003 - 01:23:28 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:27 UTC