CURRENT: em0 NIC freezes under heavy I/O on net

From: O. Hartmann <ohartmann_at_walstatt.org>
Date: Wed, 11 Jan 2017 09:27:30 +0100
Running recent CURRENT (FreeBSD 12.0-CURRENT #5 r311919: Wed Jan 11 08:24:28
CET 2017 amd64), the system freezes when doing a rsync over automounted
(autofs) NFSv4 filesystem, mounted from another CURRENT server (same revision,
but with BCM NICs).

The host in question is a Fujitsu Celsius M740 equipted with an Intel NIC:

[...]
em0: <Intel(R) PRO/1000 Network Connection> port 0xf020-0xf03f mem
0xfb300000-0xfb31ffff,0xfb339000-0xfb339fff at device 25.0 numa-domain 0 on
pci1 em0: attach_pre capping queues at 1 em0: using 1024 tx descriptors and
1024 rx descriptors em0: msix_init qsets capped at 1
em0: Unable to map MSIX table 
em0: Using an MSI interrupt
em0: allocated for 1 tx_queues
em0: allocated for 1 rx_queues
em0: netmap queues/slots: TX 1/1024, RX 1/1024
[...]

The pciconf output reveals:

em0_at_pci0:0:25:0:        class=0x020000 card=0x11ed1734 chip=0x153a8086 rev=0x05
hdr=0x00 vendor     = 'Intel Corporation'
    device     = 'Ethernet Connection I217-LM'
    class      = network
    subclass   = ethernet
    bar   [10] = type Memory, range 32, base 0xfb300000, size 131072, enabled
    bar   [14] = type Memory, range 32, base 0xfb339000, size 4096, enabled
    bar   [18] = type I/O Port, range 32, base 0xf020, size 32, enabled
    cap 01[c8] = powerspec 2  supports D0 D3  current D0
    cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
    cap 13[e0] = PCI Advanced Features: FLR TP

I have a customized kernel. The NIC has revealed itself all the time as an
"emX" device (never as igbX). The kernel contains device netmap (if
relevevant).

The phenomenon:

Syncing a poudriere repository between to remote hosts, I use rsync on a NGSv4
exported filesystem, mounted via AUTOFS. So far, this work two days ago
perfectly. Since yesterday, syncing brings down the network connection - the
connection is simply dead. Terminating the rsync, bringing em0 down and up
again doesn't help much, for short moments, the connection is established, but
dies within seconds. Restarting via "service netif restart" all network
services have the same effect: after the desaster, it is impossible for me to
bring back the NIC/connection to normal, I have to reboot. The same happens
when having heavy network load, but it takes a time and even rsync isn't
"deadly" within the same timeframe - it takes sometimes a couple of seconds,
another takes only one or two seconds to make the connection die. 

I checked with dd'ing a large file over that connection, it takes several
seconds then to make the connection freezing (so, someone could reproduce iy
not ncessarily using rsync).

Kind regards,

oh
Received on Wed Jan 11 2017 - 07:28:21 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:09 UTC