CURRENT: massive em0 NIC problems since IFLIB changes/introduction

From: O. Hartmann <ohartmann_at_walstatt.org>
Date: Fri, 17 Mar 2017 12:20:18 +0100
Since the introduction of the IFLIB changes, I realise severe problems on
CURRENT.

Running the most recent CURRENT (FreeBSD 12.0-CURRENT #27 r315442: Fri Mar 17
10:46:04 CET 2017  amd64), the problems on a workstation got severe within the
past two days:

since a couple of weeks the em0 NIC (Intel i217-LM, see below) dies on heavy
I/O. I realised this first when "rsync"ing poudriere repositories to a remote
NFSv4 (automounted) folder. The em0 device could be revived by ifconfig down/up
procedure.
But not the i217-LM chip is affected. On another box equipted with a i350 dual
port GBit NIC I observed a similar behaviour under (artificially) high I/O load
(but I didn't investigate that further since it occured very seldom). 

Now, since around yesterday, the i217-LM dies without being reviveable with
ifconfig down/up: Doing so, my FreeBSD CURRENT machine (Fujitsu Celsius M740)
remains with a dead em0 device, reporting "no route" in some occasions but
stuck in the dead state. Every attempt to establish manually the route again
fails, only rebooting the box gives some relief.

On the console, I have some very strange reports:

- ping reports suddenly about no buffer space
- or I see sometimes massive occurences of "em0: TX(0) desc avail = 1024, pidx
  = 0" on the console

Either way, sending/receiving large files on an established network GBit line
which could be saturated by approx 100 MBytes/s tend to make the NIC fail.

Since yesterday, it is quite impossible to tranfer larger files in a burst, the
NIC dies rapidly and can not be revived anymore except via reboot.

Kind regards,

O. Hartmann
Received on Fri Mar 17 2017 - 10:20:33 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:10 UTC