Re: Interrupt storm with MSI in combination with em1

From: Jack Vogel <jfvogel_at_gmail.com>
Date: Wed, 4 May 2011 17:25:39 -0700
OK, but the reason you see the multiple cases of irq 16 is that's the
bridge,
once you are using MSIX, as vmstat shows, its using other vectors.

Can you capture the messages file with the actual storm happening?

I noticed some complaints about checksums in the dmesg, have you
checked on BIOS upgrades or something like that on your motherboard?

Regards,

Jack


On Wed, May 4, 2011 at 4:27 PM, Daan Vreeken <Daan_at_vehosting.nl> wrote:

> On Thursday 05 May 2011 00:15:43 you wrote:
> > This all looks completely kosher,  what IRQ is the storm on??
>
> IRQ 16. Further down this email there is a list of devices that share the
> IRQ
> according to 'dmesg'.
>
>
> > On Wed, May 4, 2011 at 3:04 PM, Daan Vreeken <Daan_at_vehosting.nl> wrote:
> > > Hi,
> > >
> > > On Wednesday 04 May 2011 20:47:32 Jack Vogel wrote:
> > > > Will you please set it back to a default and then boot and capture
> the
> > > > message for me?
> > >
> > > No problem. Here's the output with MSI/MSIX enabled :
> > >
> > > http://vehosting.nl/pub_diffs/dmesg_plantje2_with_msix_2011_05_04.txt
> > >
> > > I've also added the output of "vmstat -i" a couple of minutes after a
> > > reboot
> > > with MSI enabled :
> > >        http://vehosting.nl/pub_diffs/vmstat_i_2011_05_04.txt
> > >
> > > Note that in the above "vmstat -i" dump the interrupt storm hasn't
> > > started yet. For some reason the storm doesn't always start directly at
> > > boot. I haven't been able (yet) to pinpoint what's triggering it to
> > > start.
> > >
> > > > On Wed, May 4, 2011 at 11:19 AM, Daan Vreeken <Daan_at_vehosting.nl>
> wrote:
> > > > > Hi Jack,
> > > > >
> > > > > Wednesday 04 May 2011 19:46:05 Jack Vogel wrote:
> > > > > > Who makes your motherboard? The problem you are having is that
> MSIX
> > > > > > AND MSI are both failing as em0 comes up, so it falls back to
> Legacy
> > > > > > interrupt mode,
> > > > > > and must be having some issue with sharing the line, causing the
> > > > > > storm.
> > > > > The motherboard is an Asus "P7H55-M".
> > > > >
> > > > > Sorry, I should have mentioned that the dmesg output is from
> booting
> > > > > with :
> > > > > > >        hw.pci.enable_msix="0"
> > > > > > >        hw.pci.enable_msi="0"
> > > > > .. in "loader.conf".
> > > > >
> > > > > With those lines in "loader.conf", MSI and MSIX is disabled, both
> > > > > cards work
> > > > > like they should and there is no interrupt storm.
> > > > >
> > > > > With MSI/MSIX enabled, both cards work like they should and I see
> the
> > > > > counters
> > > > > of the MSI interrupts increase (in small amounts, like they
> should),
> > > > > but at boot-time an interrupt storm starts on 'legacy' IRQ 16.
> > > > >
> > > > > Because the only difference between disabling/enabling MSI/MSIX
> seems
> > > > > to be in
> > > > > the way em0/em1 are used, and because 'em1' shares IRQ 16 according
> > > > > to the dmesg, I'm suspecting 'em1' is causing the storm.
> > > > > (But please correct me if I'm wrong :)
> > > > >
> > > > > What can I do to help track this problem down?
> > > > >
> > > > > > > According to "dmesg" the following devices share IRQ 16 :
> > > > > > >
> > > > > > >        pcib1: <ACPI PCI-PCI bridge> irq 16 at device 1.0 on
> pci0
> > > > > > >        em0: <Intel(R) PRO/1000 Network Connection 7.2.3> port
> > > > > > > 0xcc00-0xcc1f mem
> > > > > > >
> 0xf7de0000-0xf7dfffff,0xf7d00000-0xf7d7ffff,0xf7ddc000-0xf7ddffff
> > > > > > >           irq 16 at device 0.0 on pci1
> > > > > > >        vgapci0: <VGA-compatible display> port 0xbc00-0xbc07
> > > > > > >           mem 0xf7800000-0xf7bfffff,0xe0000000-0xefffffff irq
> 16
> > > > > > > at device 2.0 on
> > > > > > >           pci0
> > > > > > >        ehci0: <Intel PCH USB 2.0 controller USB-B> mem
> > > > > > > 0xf7cfa000-0xf7cfa3ff
> > > > > > >           irq 16 at device 26.0 on pci0
> > > > > > >        em1: <Intel(R) PRO/1000 Network Connection 7.2.3> port
> > > > > > > 0xec00-0xec1f mem
> > > > > > >
> 0xf7fe0000-0xf7ffffff,0xf7f00000-0xf7f7ffff,0xf7fdc000-0xf7fdffff
> > > > > > >           irq 16 at device 0.0 on pci4
> > > > > > >        pcib4: <ACPI PCI-PCI bridge> irq 16 at device 28.5 on
> pci0
> > > > > > >
> > > > > > > During a storm "vmstat -i" shows a rate of about 220.000
> > > > > > > interrupts/sec.
> > > > > > > MSI
> > > > > > > interrupt delivery to both 'em0' and 'em1' seems to work
> > > > > > > correctly during
> > > > > > > a storm, as I see their counters increase normally in the
> "vmstat
> > > > > > > -i" output.
> > > > > > > As only 'em0' and 'em1' seem to be using MSI interrupts, my
> guess
> > > > > > > is that the
> > > > > > > e1000 driver is causing this problem. Could it be that the
> driver
> > > > > > > forgets to
> > > > > > > clear/mask legacy interrupts when attaching the MSI interrupts
> > > > > > > perhaps?
> > > > > > >
> > > > > > > Any tips on how to debug and/or fix this?
> > > > > > >
> > > > > > >
> > > > > > > The full output of "dmesg" can be found here :
> > > > > > >
> > > > > > > http://vehosting.nl/pub_diffs/dmesg_plantje2_2011_05_04.txt
> > > > > > >
> > > > > > > And the full output of "pciconf -lv" is here :
> > >
> > > http://vehosting.nl/pub_diffs/pciconf_plantje2_2011_05_04.txt
>
>
> Thanks,
> --
> Daan Vreeken
> VEHosting
> http://VEHosting.nl
> tel: +31-(0)40-7113050 / +31-(0)6-46210825
> KvK nr: 17174380
>
Received on Wed May 04 2011 - 22:25:40 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:13 UTC