Re: em interrupt storm

From: Julian Elischer <julian_at_elischer.org>
Date: Tue, 29 Nov 2005 13:34:32 -0800
This is not really a -current question as I'm seeing it on 4.x
on a Dell 2850 with a PCI-express card, but the previous discussion
was here so I thought I'd put it here to continue the thread.

The system locks up when the em driver  em_intr() is called from
irq 2 (em3) but the interrupt was actually generated by em4.

as you can see from the vmstat below
:root 27] vmstat -i
interrupt                   total       rate
amr0 irq10                  46825         52
em0 irq11                   68724         76
em3 irq2                   186292        207
em4 irq14                  186509        207
atkbd0 irq1                     1          0
sio0 irq4                    2205          2
clk irq0                    89622         99
rtc irq8                   114722        127
Total                      694900        774
:root 28]

em3 and em4 have basically the same interrupt count and rate,
however em3 is not active and is not up.

The interrupts are coming from em4 which is a standard em type chip
on the Dell 2850 motherboard.

because em3 didn't make the interrupt, calling it's interrupt routine
doesn't clear the interrupt and so it hits again as soon as the interrupt
routine returns.
thus the system locks up spinning in and out of the interrupt handler
for em3 on irq2.

However there is something a bit strange about it. If it were as simple
as this, and irq2 always copied irq14 then one would expect to freeze
up immediatly upon activating em4, but that is not the case.

It only sems to freeze up if the system is already in a disk driver
(bio mask) when the interrupt happens.
(?)

em3 and em4 are not connected in any way I know of.
em4 is onth emotherboard
em3 is on an intel 4-port PCI-express card that is not being used.

I can make the system, work reliably by adding code to the em driver so that when
any of the em interrupts happen it checks ALL the em interfaces.
But this is notthe answer and if there were some OTHER drive on irq2
I'd still be just as hosed.

I include the dmesg.

Just for fun I might see what happens with dragnfly, though as the machine has no removable media I 
need to do that over the net so it may take some seting up.


Anyone who has any ideas as to why irq14 is being deliverred on irq2, let me know!

and let me know why teh prsence of disk IO makes a difference?

julian




Copyright (c) 1992-2004 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD 4.10-RELEASE #5: Tue Nov 29 12:41:48 GMT 2005
    root_at_trafmon1.wga:/usr/build/godspeed/freebsd/mods/src/sys/compile/MESSAGING_GATEWAY
Timecounter "i8254"  frequency 1193182 Hz
CPU: Intel(R) Xeon(TM) CPU 3.60GHz (3591.25-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0xf43  Stepping = 3
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Hyperthreading: 2 logical CPUs
real memory  = 3489398784 (3407616K bytes)
avail memory = 3400192000 (3320500K bytes)
Changing APIC ID for IO APIC #0 from 0 to 8 on chip
Changing APIC ID for IO APIC #1 from 0 to 9 on chip
Changing APIC ID for IO APIC #2 from 0 to 10 on chip
Programming 24 pins in IOAPIC #0
IOAPIC #0 intpin 2 -> irq 0
Programming 24 pins in IOAPIC #1
Programming 24 pins in IOAPIC #2
SCI INT 9
Set apic 0 pin 9, level, active low
Set apic 0 pin 9, level, active high
FreeBSD/SMP: Multiprocessor motherboard: 4 CPUs
 cpu0 (BSP): apic id:  0, version: 0x00050014, at 0xfee00000
 cpu1 (AP):  apic id:  1, version: 0x00050014, at 0xfee00000
 cpu2 (AP):  apic id:  6, version: 0x00050014, at 0xfee00000
 cpu3 (AP):  apic id:  7, version: 0x00050014, at 0xfee00000
 io0 (APIC): apic id:  8, version: 0x00178020, at 0xfec00000
 io1 (APIC): apic id:  9, version: 0x00178020, at 0xfec80000
 io2 (APIC): apic id: 10, version: 0x00178020, at 0xfec83000
Preloaded elf kernel "k2" at 0xc0397000.
Warning: Pentium 4 CPU: PSE disabled
Pentium Pro MTRR support enabled
md0: Malloc disk
Using $PIR table, 18 entries at 0xc00fb6c0
acpi0: <DELL   PE BKC  > on motherboard
acpi0: power button is handled as a fixed feature programming model.
Timecounter "ACPI-fast"  frequency 3579545 Hz
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0
acpi_cpu0: <CPU> on acpi0
acpi_cpu1: <CPU> on acpi0
acpi_cpu2: <CPU> on acpi0
acpi_cpu3: <CPU> on acpi0
npx0: <math processor> on motherboard
npx0: INT 16 interface
dell_bios0: Found Dell signature
dell_bios0: <System Management BIOS> at iomem 0xf99f0-0xf9a0e on motherboard
dell_bios0: Version: 2.03, Revision: 2.03
dell_bios0: Enable 0
dell_bios0: Disable 1
dell_bios0: Size 1025K
dell_bios0: Completion Code 0x0000 Success
dell_bios0: Updated on: 8/15/05 at 23:44
dell_bios0: Version A03
dell_bios0: Min. Version A03
dell_bios0: Manufacturer IronPort
dell_bios0: System ID 32824 [8038]
ipmi0: Found Dell signature
ipmi0: <System Management BIOS> at iomem 0xf99f0-0xf9a0e on motherboard
ipmi0: Version: 2.03, Revision: 2.03
ipmi0: KCS mode found
ipmi0: Address    0xca8
ipmi0: Allignment 0x4
ipmi0: I/O mode
ipmi0: Device Rev. 0
ipmi0: Firmware Rev. 1.81
ipmi0: Version 1.5
ipmi0: Number of channels 4
pcib0: <Host to PCI bridge> on motherboard
IOAPIC #0 intpin 16 -> irq 2
Hello 8f6f
Hello 8f6f
pci0: <PCI bus> on pcib0
pcib1: <PCI to PCI bridge (vendor=8086 device=3595)> irq 2 at device 2.0 on pci0
pci1: <PCI bus> on pcib1
pcib2: <PCI to PCI bridge (vendor=8086 device=0330)> at device 0.0 on pci1
IOAPIC #1 intpin 14 -> irq 10
pci2: <PCI bus> on pcib2
amr0: <LSILogic MegaRAID 1.53> mem 0xdfec0000-0xdfefffff,0xd80f0000-0xd80fffff irq 10 at device 14.0 on pci2
amr0: delete logical drives supported by controller
 created DEVICE*****************
amr0: <LSILogic PERC 4e/Di> Firmware 516A, BIOS H418, 256MB RAM
pcib3: <PCI to PCI bridge (vendor=8086 device=0332)> at device 0.2 on pci1
pci3: <PCI bus> on pcib3
pcib4: <PCI to PCI bridge (vendor=8086 device=3597)> irq 2 at device 4.0 on pci0
pci4: <PCI bus> on pcib4
pcib5: <PCI to PCI bridge (vendor=10b5 device=8516)> mem 0xdf6e0000-0xdf6fffff irq 2 at device 0.0 on pci4
pci5: <PCI bus> on pcib5
pcib6: <PCI to PCI bridge (vendor=10b5 device=8516)> irq 0 at device 1.0 on pci5
pci6: <PCI bus> on pcib6
pcib7: <PCI to PCI bridge (vendor=10b5 device=8516)> irq 0 at device 2.0 on pci5
IOAPIC #0 intpin 18 -> irq 11
IOAPIC #0 intpin 19 -> irq 13
pci7: <PCI bus> on pcib7
em0: <Intel(R) PRO/1000 Network Connection Version - Bypass-1.0.0> port 0xece0-0xecff mem 0xdfbc0000-0xdfbdffff,0xdfbe0000-0xdfbfffff irq 11 at device 0.0 on pci7
em0 00:0e:0c:a1:6a:28,
em0:  Speed:N/A  Duplex:N/A
em1: <Intel(R) PRO/1000 Network Connection Version - Bypass-1.0.0> port 0xecc0-0xecdf mem 0xdfb80000-0xdfb9ffff,0xdfba0000-0xdfbbffff irq 13 at device 0.1 on pci7
em0 00:0e:0c:a1:6a:28,em1 00:0e:0c:a1:6a:29,
em1:  Speed:N/A  Duplex:N/A
pcib8: <PCI to PCI bridge (vendor=10b5 device=8516)> irq 0 at device 3.0 on pci5
pci8: <PCI bus> on pcib8
em2: <Intel(R) PRO/1000 Network Connection Version - Bypass-1.0.0> port 0xdce0-0xdcff mem 0xdf9c0000-0xdf9dffff,0xdf9e0000-0xdf9fffff irq 13 at device 0.0 on pci8
em0 00:0e:0c:a1:6a:28,em1 00:0e:0c:a1:6a:29,em2 00:0e:0c:a1:6a:2a,
em2:  Speed:N/A  Duplex:N/A
em3: <Intel(R) PRO/1000 Network Connection Version - Bypass-1.0.0> port 0xdcc0-0xdcdf mem 0xdf980000-0xdf99ffff,0xdf9a0000-0xdf9bffff irq 2 at device 0.1 on pci8
em0 00:0e:0c:a1:6a:28,em1 00:0e:0c:a1:6a:29,em2 00:0e:0c:a1:6a:2a,em3 00:0e:0c:a1:6a:2b,
em3:  Speed:N/A  Duplex:N/A
pcib9: <PCI to PCI bridge (vendor=8086 device=3598)> irq 2 at device 5.0 on pci0
pci9: <PCI bus> on pcib9
pcib10: <PCI to PCI bridge (vendor=8086 device=0329)> at device 0.0 on pci9
IOAPIC #2 intpin 0 -> irq 14
pci10: <PCI bus> on pcib10
em4: <Intel(R) PRO/1000 Network Connection Version - Bypass-1.0.0> port 0xccc0-0xccff mem 0xdf4e0000-0xdf4fffff irq 14 at device 7.0 on pci10
em0 00:0e:0c:a1:6a:28,em1 00:0e:0c:a1:6a:29,em2 00:0e:0c:a1:6a:2a,em3 00:0e:0c:a1:6a:2b,em4 00:14:22:0f:45:2f,
em4:  Speed:N/A  Duplex:N/A
pcib11: <PCI to PCI bridge (vendor=8086 device=032a)> at device 0.2 on pci9
IOAPIC #2 intpin 1 -> irq 15
pci11: <PCI bus> on pcib11
em5: <Intel(R) PRO/1000 Network Connection Version - Bypass-1.0.0> port 0xbcc0-0xbcff mem 0xdf2e0000-0xdf2fffff irq 15 at device 8.0 on pci11
em0 00:0e:0c:a1:6a:28,em1 00:0e:0c:a1:6a:29,em2 00:0e:0c:a1:6a:2a,em3 00:0e:0c:a1:6a:2b,em4 00:14:22:0f:45:2f,em5 00:14:22:0f:45:30,
em5:  Speed:N/A  Duplex:N/A
pcib12: <PCI to PCI bridge (vendor=8086 device=3599)> irq 2 at device 6.0 on pci0
pci12: <PCI bus> on pcib12
pcib13: <Intel 82801BA/BAM (ICH2) Hub to PCI bridge> at device 30.0 on pci0
pci13: <PCI bus> on pcib13
pci13: <ATI model 5159 graphics accelerator> at 13.0 irq 11
isab0: <PCI to ISA bridge (vendor=8086 device=24d0)> at device 31.0 on pci0
isa0: <ISA bus> on isab0
orm0: <Option ROMs> at iomem 0xc0000-0xcafff,0xcb000-0xcbfff,0xce800-0xcf7ff,0xec000-0xeffff on isa0
pmtimer0 on isa0
atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0
atkbd0: <AT Keyboard> flags 0x1 irq 1 on atkbdc0
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: model IntelliMouse, device ID 3
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x100>
sio0 at port 0x3f8-0x3ff irq 4 flags 0x110 on isa0
sio0: type 16550A, console
sio1: configured irq 3 not in bitmap of probed irqs 0
APIC_IO: Testing 8254 interrupt delivery
APIC_IO: routing 8254 via IOAPIC #0 intpin 2
DUMMYNET initialized (011031)
BRIDGE 020214 loaded
ipfw2 initialized, divert enabled, rule-based forwarding enabled, default to accept, logging disabled
Scanning via xpt_config
Scanning via xpt_config
amr0: delete logical drives supported by controller
amrd0: <LSILogic MegaRAID logical drive> on amr0
amrd0: 572160MB (1171783680 sectors) RAID 1 (optimal)
SMP: AP CPU #2 Launched!
SMP: AP CPU #3 Launched!
SMP: AP CPU #1 Launched!
em4: Link is up 100 Mbps Full Duplex
Mounting root from ufs:/dev/amrd0s1a
Received on Tue Nov 29 2005 - 20:35:13 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:48 UTC