Re: em interrupt storm

From: Michael Vince <mv_at_roq.com> Date: Thu, 24 Nov 2005 12:40:24 +1100 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:48 UTC

Kris Kennaway wrote:

>On Tue, Nov 22, 2005 at 08:54:49PM -0800, John Polstra wrote:
>  
>
>>On 23-Nov-2005 Kris Kennaway wrote:
>>    
>>
>>>I am seeing the em driver undergoing an interrupt storm whenever the
>>>amr driver receives interrupts.  In this case I was running newfs on
>>>the amr array and em0 was not in use:
>>>
>>>   28 root        1 -68 -187     0K     8K CPU1   1   0:32 53.98% irq16: em0
>>>   36 root        1 -64 -183     0K     8K RUN    1   0:37 27.75% irq24: amr0
>>>
>>># vmstat -i
>>>interrupt                          total       rate
>>>irq1: atkbd0                           2          0
>>>irq4: sio0                           199          1
>>>irq6: fdc0                            32          0
>>>irq13: npx0                            1          0
>>>irq14: ata0                           47          0
>>>irq15: ata1                          931          5
>>>irq16: em0                       6321801      37187
>>>irq24: amr0                        28023        164
>>>cpu0: timer                       337533       1985
>>>cpu1: timer                       337285       1984
>>>Total                            7025854      41328
>>>
>>>When newfs finished (i.e. amr was idle), em0 stopped storming.
>>>
>>>MPTable: <INTEL    SE7520BD22  >
>>>      
>>>
>>This is the dreaded interrupt aliasing problem that several of us have
>>experienced with this chipset.  High-numbered interrupts alias down to
>>interrupts in the range 16..19 (or maybe 16..23), a multiple of 8 less
>>than the original interupt.
>>
>>Nobody knows what causes it, and nobody knows how to fix it.
>>    
>>
>
>This would be good to document somewhere so that people don't either
>accidentally buy this hardware, or know what to expect when they run
>it.
>
>Kris
>  
>
This is Intels latest server chipset designs and Dell are putting that 
chipset in all their servers.
Luckily I haven't not seen the problem on any of my Dell servers (as 
long as I am looking at this right).

This server has been running for a long time.
vmstat -i
interrupt                          total       rate
irq1: atkbd0                           6          0
irq4: sio0                         23433          0
irq6: fdc0                            10          0
irq8: rtc                     2631238611        128
irq13: npx0                            1          0
irq14: ata0                           99          0
irq16: uhci0                  1507608958         73
irq18: uhci2                    42005524          2
irq19: uhci1                           3          0
irq23: atapci0                       151          0
irq46: amr0                     41344088          2
irq64: em0                    1513106157         73
irq0: clk                     2055605782         99
Total                         7790932823        379

This one just transfered over 8gigs of data in 77seconds with around 
1000 simultaneous tcp connections under a load of 35. Both seem OK.
vmstat -i
interrupt                          total       rate
irq4: sio0                           315          0
irq13: npx0                            1          0
irq14: ata0                           47          0
irq16: uhci0                     2894669          2
irq18: uhci2                      977413          0
irq23: ehci0                           3          0
irq46: amr0                       883138          0
irq64: em0                       2890414          2
cpu0: timer                   2763566717       1999
cpu3: timer                   2763797300       1999
cpu1: timer                   2763551479       1999
cpu2: timer                   2763797870       1999
Total                        11062359366       8004

Mike