Re: em(4) stops forwarding

From: Michal Mertl <mime_at_traveller.cz>
Date: Fri, 24 Feb 2006 17:51:55 +0100
Scott Long wrote:
> Michal Mertl wrote:
> 
> > Scott Long wrote:
> > 
> >>Michal Mertl wrote:
> >>
> >>>Scott Long wrote:
> >>>
> >>>
> >>>>Michal Mertl wrote:
> >>>>
> >>>>
> >>>>>Hello,
> >>>>>
> >>>>>I've been running CURRENT for long time and never experienced problem
> >>>>>with the built-in em(4) card before. Recently (I first noticed it on Jan
> >>>>>24) the card has stopped working several times. Nothing gets into the
> >>>>>log file. Carrier is still detected properly but no data is exchanged.
> >>>>>Ifconfig up/down doesn't help but kldunload/load does. When I run
> >>>>>tcpdump I don't see any packet coming in but I see some outgoing.
> >>>>>
> >>>>>Can someone suggest what to look at when it happens the next time? I
> >>>>>have DDB compiled in. I will try to sniff the wire using another machine
> >>>>>next time to see if the card sends out anything.
> >>>>>
> >>>>>The command 'pciconf -lv' says about the card this:
> >>>>>em0_at_pci2:1:0:   class=0x020000 card=0x05491014 chip=0x101e8086 rev=0x03
> >>>>>hdr=0x00
> >>>>>   vendor   = 'Intel Corporation'
> >>>>>   device   = '82540EP Gigabit Ethernet Controller (Mobile)'
> >>>>>   class    = network
> >>>>>   subclass = ethernet
> >>>>>
> >>>>>The dmesg:
> >>>>>em0: <Intel(R) PRO/1000 Network Connection Version - 3.2.18> port
> >>>>>0x8000-0x803f mem 0xc0220000-0xc023ffff,0xc0200000-0xc020ffff irq 11 at
> >>>>>device 1.0 on pci2
> >>>>>em0: Ethernet address: 00:0d:60:cd:ae:e2
> >>>>>em0: [FAST]
> >>>>>
> >>>>>The interrupt is shared since the machine is a notebook. I don't know if
> >>>>>it was just a coincidence but I think that it happened at the same time
> >>>>>as my USB mouse stopped working - the USB controller is on the same irq.
> >>>>>
> >>>>>Michal
> >>>>>
> >>>>
> >>>>What is sharing the interrupt?
> >>>
> >>>
> >>>vgapci0, ipw0, ehci0, uhci0-2. I don't think vgapci0 and ipw0 are really
> >>>using the interrupt when I use em0.
> >>>
> >>>
> >>
> >>Ouch.  For now, edit /sys/dev/em/if_em.c and add the following line to 
> >>the top of the file:
> >>
> >>#define NO_EM_FASTINTR
> > 
> > 
> > Do you know the reason of the problem? Wouldn't it be better if I used
> > stock driver and got some information for you when it doesn't work? I
> > use the machine as my workstation so it isn't such a big problem when it
> > looses the network.
> > 
> 
> The problem is that the drivers that are sharing the interrupt,
> particularly the USB ones, can spend a very very long time waiting on
> locks to service the interrupt.  During that time, the interrupt pin is
> masked and the all interrupts from all shared devices don't get
> delivered. So even though the if_em driver has a very fast interrupt
> handler, it still has to wait on the USB drivers.  During that wait, a
> burst of network traffic might come into the card, filling its buffers
> and triggering an overflow.  This would be especially likely to happen
> while the kernel is flushing out filesystem i/o.  In theory the
> interrupt service latency shouldn't be any different whether the if_em
> driver is fast or not, but there might be coincidental timing issues
> that I don't understand.  That's why I'd like you to set the #ifdef in
> the driver to revert it back to it's classic behaviour and see if the
> problem persists.  If it doesn't, then I'll have to rethink some of the
> changes that I made to it.
> 

I thought I should let you know if I still experience the em lock up.
The answer is unfotunately that it didn't happen any more neither with
NO_EM_FASTINTR defined or not.

> Scott
> 
> > 
> >>Also, does your kernel config include the apic device?
> > 
> > 
> > Yes, it does. But I believe that the chipset doesn't have it and neither
> > the CPU supports it.
> > 
> > Michal
> > 
> 
> 
Received on Fri Feb 24 2006 - 15:52:19 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:52 UTC