Re: em(4) stops forwarding

From: Scott Long <scottl_at_samsco.org>
Date: Thu, 02 Feb 2006 07:33:47 -0700
Michal Mertl wrote:

> Scott Long wrote:
> 
>>Michal Mertl wrote:
>>
>>>Scott Long wrote:
>>>
>>>
>>>>Michal Mertl wrote:
>>>>
>>>>
>>>>>Hello,
>>>>>
>>>>>I've been running CURRENT for long time and never experienced problem
>>>>>with the built-in em(4) card before. Recently (I first noticed it on Jan
>>>>>24) the card has stopped working several times. Nothing gets into the
>>>>>log file. Carrier is still detected properly but no data is exchanged.
>>>>>Ifconfig up/down doesn't help but kldunload/load does. When I run
>>>>>tcpdump I don't see any packet coming in but I see some outgoing.
>>>>>
>>>>>Can someone suggest what to look at when it happens the next time? I
>>>>>have DDB compiled in. I will try to sniff the wire using another machine
>>>>>next time to see if the card sends out anything.
>>>>>
>>>>>The command 'pciconf -lv' says about the card this:
>>>>>em0_at_pci2:1:0:   class=0x020000 card=0x05491014 chip=0x101e8086 rev=0x03
>>>>>hdr=0x00
>>>>>   vendor   = 'Intel Corporation'
>>>>>   device   = '82540EP Gigabit Ethernet Controller (Mobile)'
>>>>>   class    = network
>>>>>   subclass = ethernet
>>>>>
>>>>>The dmesg:
>>>>>em0: <Intel(R) PRO/1000 Network Connection Version - 3.2.18> port
>>>>>0x8000-0x803f mem 0xc0220000-0xc023ffff,0xc0200000-0xc020ffff irq 11 at
>>>>>device 1.0 on pci2
>>>>>em0: Ethernet address: 00:0d:60:cd:ae:e2
>>>>>em0: [FAST]
>>>>>
>>>>>The interrupt is shared since the machine is a notebook. I don't know if
>>>>>it was just a coincidence but I think that it happened at the same time
>>>>>as my USB mouse stopped working - the USB controller is on the same irq.
>>>>>
>>>>>Michal
>>>>>
>>>>
>>>>What is sharing the interrupt?
>>>
>>>
>>>vgapci0, ipw0, ehci0, uhci0-2. I don't think vgapci0 and ipw0 are really
>>>using the interrupt when I use em0.
>>>
>>>
>>
>>Ouch.  For now, edit /sys/dev/em/if_em.c and add the following line to 
>>the top of the file:
>>
>>#define NO_EM_FASTINTR
> 
> 
> Do you know the reason of the problem? Wouldn't it be better if I used
> stock driver and got some information for you when it doesn't work? I
> use the machine as my workstation so it isn't such a big problem when it
> looses the network.
> 

The problem is that the drivers that are sharing the interrupt,
particularly the USB ones, can spend a very very long time waiting on
locks to service the interrupt.  During that time, the interrupt pin is
masked and the all interrupts from all shared devices don't get
delivered. So even though the if_em driver has a very fast interrupt
handler, it still has to wait on the USB drivers.  During that wait, a
burst of network traffic might come into the card, filling its buffers
and triggering an overflow.  This would be especially likely to happen
while the kernel is flushing out filesystem i/o.  In theory the
interrupt service latency shouldn't be any different whether the if_em
driver is fast or not, but there might be coincidental timing issues
that I don't understand.  That's why I'd like you to set the #ifdef in
the driver to revert it back to it's classic behaviour and see if the
problem persists.  If it doesn't, then I'll have to rethink some of the
changes that I made to it.

Scott

> 
>>Also, does your kernel config include the apic device?
> 
> 
> Yes, it does. But I believe that the chipset doesn't have it and neither
> the CPU supports it.
> 
> Michal
> 
Received on Thu Feb 02 2006 - 13:33:54 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:51 UTC