Re: CURRENT: em0 NIC freezes under heavy I/O on net

From: Sean Bruno <sbruno_at_freebsd.org> Date: Thu, 12 Jan 2017 10:38:32 -0700 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:09 UTC

On 01/11/17 01:27, O. Hartmann wrote:
> Running recent CURRENT (FreeBSD 12.0-CURRENT #5 r311919: Wed Jan 11 08:24:28
> CET 2017 amd64), the system freezes when doing a rsync over automounted
> (autofs) NFSv4 filesystem, mounted from another CURRENT server (same revision,
> but with BCM NICs).
> 
> The host in question is a Fujitsu Celsius M740 equipted with an Intel NIC:
> 
> [...]
> em0: <Intel(R) PRO/1000 Network Connection> port 0xf020-0xf03f mem
> 0xfb300000-0xfb31ffff,0xfb339000-0xfb339fff at device 25.0 numa-domain 0 on
> pci1 em0: attach_pre capping queues at 1 em0: using 1024 tx descriptors and
> 1024 rx descriptors em0: msix_init qsets capped at 1
> em0: Unable to map MSIX table 
> em0: Using an MSI interrupt
> em0: allocated for 1 tx_queues
> em0: allocated for 1 rx_queues
> em0: netmap queues/slots: TX 1/1024, RX 1/1024
> [...]
> 
> The pciconf output reveals:
> 
> em0_at_pci0:0:25:0:        class=0x020000 card=0x11ed1734 chip=0x153a8086 rev=0x05
> hdr=0x00 vendor     = 'Intel Corporation'
>     device     = 'Ethernet Connection I217-LM'
>     class      = network
>     subclass   = ethernet
>     bar   [10] = type Memory, range 32, base 0xfb300000, size 131072, enabled
>     bar   [14] = type Memory, range 32, base 0xfb339000, size 4096, enabled
>     bar   [18] = type I/O Port, range 32, base 0xf020, size 32, enabled
>     cap 01[c8] = powerspec 2  supports D0 D3  current D0
>     cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1 message
>     cap 13[e0] = PCI Advanced Features: FLR TP
> 
> I have a customized kernel. The NIC has revealed itself all the time as an
> "emX" device (never as igbX). The kernel contains device netmap (if
> relevevant).
> 
> The phenomenon:
> 
> Syncing a poudriere repository between to remote hosts, I use rsync on a NGSv4
> exported filesystem, mounted via AUTOFS. So far, this work two days ago
> perfectly. Since yesterday, syncing brings down the network connection - the
> connection is simply dead. Terminating the rsync, bringing em0 down and up
> again doesn't help much, for short moments, the connection is established, but
> dies within seconds. Restarting via "service netif restart" all network
> services have the same effect: after the desaster, it is impossible for me to
> bring back the NIC/connection to normal, I have to reboot. The same happens
> when having heavy network load, but it takes a time and even rsync isn't
> "deadly" within the same timeframe - it takes sometimes a couple of seconds,
> another takes only one or two seconds to make the connection die. 
> 
> I checked with dd'ing a large file over that connection, it takes several
> seconds then to make the connection freezing (so, someone could reproduce iy
> not ncessarily using rsync).
> 
> Kind regards,
> 
> oh

If you have the time today or tomorrow.  Can you please capture 'sysctl
dev.em.0' and post it here?

In addition, I would like to have this patch tested in your configuration:

https://people.freebsd.org/~sbruno/em_tx_limit.diff

Finally, if you have any loader.conf entries for hw.em, please post them
as well.

sean