On Wed, 5 Nov 2003, Harti Brandt wrote: HB>On Tue, 4 Nov 2003, John Baldwin wrote: HB> HB>JB> HB>JB>On 04-Nov-2003 Harti Brandt wrote: HB>JB>> On Tue, 4 Nov 2003, Harti Brandt wrote: HB>JB>> HB>JB>> HB>On Tue, 4 Nov 2003, John Baldwin wrote: HB>JB>> HB> HB>JB>> HB>JB> HB>JB>> HB>JB>On 04-Nov-2003 Harti Brandt wrote: HB>JB>> HB>JB>> HB>JB>> HB>JB>> Hi, HB>JB>> HB>JB>> HB>JB>> HB>JB>> I have an ASUS system with 2 CPUs that I need to run at HZ=10000. This HB>JB>> HB>JB>> worked until yesterday, but with the new interrupt code it doesn't boot HB>JB>> HB>JB>> anymore. It works for the standard HZ, but if I set HZ=1000 I get a double HB>JB>> HB>JB>> fault. I suspect a race condition in the interrupt handling. My config HB>JB>> HB>JB>> file has HB>JB>> HB>JB>> HB>JB>> HB>JB>> options SMP HB>JB>> HB>JB>> device apic HB>JB>> HB>JB>> options HZ=1000 HB>JB>> HB>JB> HB>JB>> HB>JB>Ok, I can try to reproduce. HB>JB>> HB>JB> HB>JB>> HB>JB>> Device configuration finished. HB>JB>> HB>JB>> Timecounter "TSC" frequency 1380009492 Hz quality -100 HB>JB>> HB>JB>> Timecounters cpuid = 0; apic id = 00 HB>JB>> HB>JB>> instruction pointer = 0x8:0xc048995d HB>JB>> HB>JB>> stack pointer = 0x10:0xc0821bf4 HB>JB>> HB>JB>> frame pointer cpuid = 0; apic id = 00 HB>JB>> HB>JB>> HB>JB>> HB>JB>> 0xc048995d is in critical_exit. It is the jmp after the popf from HB>JB>> HB>JB>> cpu_critical_exit. HB>JB>> HB>JB> HB>JB>> HB>JB>This is where interrupts are re-enabled, so you are getting an interrupt. HB>JB>> HB>JB>It might be helpful to figure what type of fault you are actually getting. HB>JB>> HB> HB>JB>> HB>tf_err is 0, tf_trapno is 30 (decimal). HB>JB>> HB>JB>> More information: HB>JB>> HB>JB>> I have replaced all the reserved vectors with individual ones, that set HB>JB>> tf_err to the index (vector number). It appears the the vector number is HB>JB>> 39 decimal. What does that mean? HB>JB> HB>JB>IRQ 7. HB>JB>Can you post a verbose dmesg? Also, can you try both with and without HB>JB>ACPI? HB> HB>Attached are both dmesgs. HB> HB>More datapoints: HB> HB>I had the parallel port (irq7) and the second sio disabled in the BIOS. HB>After enabling both I now get a panic in lapic_handle_intr: Couldn't get HB>vector from ISR! After fetching the relevant docs from intel I checked the HB>registers of the apic pointed to by lapic. The interrupt taken is HB>Xapic_irq1. isr1 is zero, but irr1 is 0x100 (that was without ACPI). How HB>may that happen? As I understand ISR are the interrupts that have been HB>delivered to the CPU so if it is interrupted a bit should be set, correct? HB> HB>I then have replaced the panic by a printf() followed by a return. Now the HB>system comes to live, but I get a couple of these warnings. When the HB>system is idle everyting seems fine, but when I start my simulation HB>application (which normally generates between 20k and 250k interrupts/sec HB>depending on the MPSAFE setting of the ATM drivers) I get approx 1-2 of HB>these messages per second (this is with HZ=1000). HB> HB>A question while reading the code: what does the global lapic variable HB>refer to? As I understand every CPU has its local APIC. Does it point to HB>one of those two? To which? An additional point. In the above test where I got 1-2 message per second I have now disabled a debugging printout in the ATM driver that gave 3-4 messages per second (from the interrupt handler). Now the 'Couldn't get...' messages have disappeared. So this really looks like a race somewhere. Is it possible that the bit in the ISR gets somehow cleared between the point where the interrupt is handed to the processor but before the Xapic_irq1 really runs and sees that bit? Perhaps from another Xapic_irq1 instance or whatever? harti -- harti brandt, http://www.fokus.fraunhofer.de/research/cc/cats/employees/hartmut.brandt/private brandt_at_fokus.fraunhofer.de, harti_at_freebsd.orgReceived on Wed Nov 05 2003 - 03:07:49 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:27 UTC