Michael Vince wrote: > Kris Kennaway wrote: > >> On Tue, Nov 22, 2005 at 08:54:49PM -0800, John Polstra wrote: >> >> >>> On 23-Nov-2005 Kris Kennaway wrote: >>> >>> >>>> I am seeing the em driver undergoing an interrupt storm whenever the >>>> amr driver receives interrupts. In this case I was running newfs on >>>> the amr array and em0 was not in use: >>>> >>>> 28 root 1 -68 -187 0K 8K CPU1 1 0:32 53.98% >>>> irq16: em0 >>>> 36 root 1 -64 -183 0K 8K RUN 1 0:37 27.75% >>>> irq24: amr0 >>>> >>>> # vmstat -i >>>> interrupt total rate >>>> irq1: atkbd0 2 0 >>>> irq4: sio0 199 1 >>>> irq6: fdc0 32 0 >>>> irq13: npx0 1 0 >>>> irq14: ata0 47 0 >>>> irq15: ata1 931 5 >>>> irq16: em0 6321801 37187 >>>> irq24: amr0 28023 164 >>>> cpu0: timer 337533 1985 >>>> cpu1: timer 337285 1984 >>>> Total 7025854 41328 >>>> >>>> When newfs finished (i.e. amr was idle), em0 stopped storming. >>>> >>>> MPTable: <INTEL SE7520BD22 > >>>> >>> >>> This is the dreaded interrupt aliasing problem that several of us have >>> experienced with this chipset. High-numbered interrupts alias down to >>> interrupts in the range 16..19 (or maybe 16..23), a multiple of 8 less >>> than the original interupt. >>> >>> Nobody knows what causes it, and nobody knows how to fix it. >>> >> >> >> This would be good to document somewhere so that people don't either >> accidentally buy this hardware, or know what to expect when they run >> it. >> >> Kris >> >> > This is Intels latest server chipset designs and Dell are putting that > chipset in all their servers. > Luckily I haven't not seen the problem on any of my Dell servers (as > long as I am looking at this right). > > This server has been running for a long time. > vmstat -i > interrupt total rate > irq1: atkbd0 6 0 > irq4: sio0 23433 0 > irq6: fdc0 10 0 > irq8: rtc 2631238611 128 > irq13: npx0 1 0 > irq14: ata0 99 0 > irq16: uhci0 1507608958 73 > irq18: uhci2 42005524 2 > irq19: uhci1 3 0 > irq23: atapci0 151 0 > irq46: amr0 41344088 2 > irq64: em0 1513106157 73 > irq0: clk 2055605782 99 > Total 7790932823 379 > > This one just transfered over 8gigs of data in 77seconds with around > 1000 simultaneous tcp connections under a load of 35. Both seem OK. > vmstat -i > interrupt total rate > irq4: sio0 315 0 > irq13: npx0 1 0 > irq14: ata0 47 0 > irq16: uhci0 2894669 2 > irq18: uhci2 977413 0 > irq23: ehci0 3 0 > irq46: amr0 883138 0 > irq64: em0 2890414 2 > cpu0: timer 2763566717 1999 > cpu3: timer 2763797300 1999 > cpu1: timer 2763551479 1999 > cpu2: timer 2763797870 1999 > Total 11062359366 8004 > > Mike > > Looks like at least some of your interrupts are being aliased to irq16, which just happens to be USB(uhci) in this case. Note that the rate is the same between irq64 and irq16, and the totals are pretty close. If you don't need USB, I'd suggest turning it off. ScottReceived on Thu Nov 24 2005 - 00:46:54 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:48 UTC