Re: CURRENT: massive em0 NIC problems since IFLIB changes/introduction

From: O. Hartmann <ohartmann_at_walstatt.org>
Date: Fri, 17 Mar 2017 16:41:01 +0100
Am Fri, 17 Mar 2017 14:15:01 +0100
Alexander Leidinger <Alexander_at_leidinger.net> schrieb:

> Quoting "O. Hartmann" <ohartmann_at_walstatt.org> (from Fri, 17 Mar 2017  
> 12:20:18 +0100):
> 
> > Since the introduction of the IFLIB changes, I realise severe problems on
> > CURRENT.  
> 
> I already reported something like this to sbruno_at_ and M. Macy (in copy).
> 
> > Running the most recent CURRENT (FreeBSD 12.0-CURRENT #27 r315442: Fri Mar 17
> > 10:46:04 CET 2017  amd64), the problems on a workstation got severe  
> > within the
> > past two days:
> >
> > since a couple of weeks the em0 NIC (Intel i217-LM, see below) dies on heavy
> > I/O. I realised this first when "rsync"ing poudriere repositories to a remote
> > NFSv4 (automounted) folder. The em0 device could be revived by  
> > ifconfig down/up
> > procedure.
> > But not the i217-LM chip is affected. On another box equipted with a  
> > i350 dual
> > port GBit NIC I observed a similar behaviour under (artificially)  
> > high I/O load
> > (but I didn't investigate that further since it occured very seldom).  
> 
> It's not only those chipsets.
> 
> It may be beneficial if you could provide the pciconf output for those  
> devices. Mine is:
> ---snip---
> em0_at_pci0:2:6:0: class=0x020000 card=0x13768086 chip=0x107c8086  
> rev=0x05 hdr=0x00
>      vendor     = 'Intel Corporation'
>      device     = '82541PI Gigabit Ethernet Controller'
> ---snip---
> 
> > Now, since around yesterday, the i217-LM dies without being reviveable with
> > ifconfig down/up: Doing so, my FreeBSD CURRENT machine (Fujitsu Celsius M740)  
> 
> I don't know if for the chip I see this issue with a simple down/up  
> would help (it's a headless server in a remote datacenter). For the  
> moment I'm using the workaround of something like "ping -C 1 <gateway>  
> || shutdown -r now" in crontab.
> 
> The system in question is at r314137.
> 
> > remains with a dead em0 device, reporting "no route" in some occasions but
> > stuck in the dead state. Every attempt to establish manually the route again
> > fails, only rebooting the box gives some relief.
> >
> > On the console, I have some very strange reports:
> >
> > - ping reports suddenly about no buffer space
> > - or I see sometimes massive occurences of "em0: TX(0) desc avail =  
> > 1024, pidx
> >   = 0" on the console  
> 
> I don't see this in messages or console log, but I see that ntpd can't  
> resolve hostnames in the logs.
> 
> > Either way, sending/receiving large files on an established network GBit line
> > which could be saturated by approx 100 MBytes/s tend to make the NIC fail.  
> 
> I can report that the "svnlite update" on the box of of the FreeBSD  
> src tree is able to trigger the issue in my case.
> 
> I have to add that before the iflib changes I've seen frequent  
> em-watchdog timeouts in the logs / dmesg. So for me we have two issues  
> here:
>   - the hardware wasn't 100% supported before the iflib changes (it seems)
>   - the iflib changes have lost some watchdog functionality /  
> auto-failure-recovery feature
> 
> Bye,
> Alexander.
> 

In January (18.01.2017), I reported Sean Bruno some strange behaviour of the same box
alongside with some details (I forgort to send in the Email you're reposnding to, sorry)
of the hardware, so here it is again:

[...]
Again, here is the pciconf output of the device: 

em0_at_pci0:0:25:0:        class=0x020000 card=0x11ed1734 chip=0x153a8086
rev=0x05 hdr=0x00 vendor     = 'Intel Corporation'
    device     = 'Ethernet Connection I217-LM'
    class      = network
    subclass   = ethernet
    bar   [10] = type Memory, range 32, base 0xfb300000, size 131072, enabled
    bar   [14] = type Memory, range 32, base 0xfb339000, size 4096, enabled
    bar   [18] = type I/O Port, range 32, base 0xf020, size 32, enabled

[...]
The problem has become a severe state within the past two days. I did on a daily basis
CURRENT buildwords, did poudriere builds several times and tried to sync them to the
package repository server - and that failed dramatically as described above starting with
yesterday.

-- 
O. Hartmann

Ich widerspreche der Nutzung oder Übermittlung meiner Daten für
Werbezwecke oder für die Markt- oder Meinungsforschung (§ 28 Abs. 4 BDSG).

Received on Fri Mar 17 2017 - 14:41:20 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:10 UTC