Re: CURRENT: em0 NIC freezes under heavy I/O on net

From: O. Hartmann <ohartmann_at_walstatt.org>
Date: Wed, 11 Jan 2017 13:46:37 +0100
On Wed, 11 Jan 2017 03:06:19 -0800
Matthew Macy <mmacy_at_nextbsd.org> wrote:

Hello,

thanks for your responding.

Your Email looks funny in my claws-mail ;-)

You asked whether it started with the introduction of IFLIB - I do not know.
Last week (I think it was Friday, and I did at least two updates of
world/kernel that day), I had a very similar situation on this box,
but it could be solved by disabling/commenting out the officially-non-supported
option "options EM_MULTIQUEUE".

Around yesterday, also after several buildworld/buildkernels (so I can not tell
about the revision number), the problem under heavy load occured even without
EM_MULTIQUEUE.

I have no idea when the first code really flushed into HEAD.

The problem can be solved by "ifconfig down && ifconfig up' temporarily as long
as there is no load. That way, I managed to rsync a repository, but it took its
while ... 

As long as the NIC is not under pressure/heavy I/O load, there is no problem so
far. We run lots of i350, i210 devices and I also have those with my SoHo and I
didn't have had these severe issues even putting a high load on two servers
with the same rsyncing of a ports repo. They took the load (i350). i210 has not
been tested under load.

Hopefully, this naive observation is od use. i have no debug kernels at the
moment ... sorry.

Kind regards,

Oliver Hartmann

>         
> 
>         
>             It looks like I have the wrong msix bar value for your NIC. Will
> fix in the next day or so.-M---- On Wed, 11 Jan 2017 00:27:30 -0800  O.
> Hartmann<ohartmann_at_walstatt.org> wrote ----Running recent CURRENT (FreeBSD
> 12.0-CURRENT #5 r311919: Wed Jan 11 08:24:28 CET 2017 amd64), the system
> freezes when doing a rsync over automounted (autofs) NFSv4 filesystem,
> mounted from another CURRENT server (same revision, but with BCM NICs).  The
> host in question is a Fujitsu Celsius M740 equipted with an Intel NIC:  [...]
> em0: <Intel(R) PRO/1000 Network Connection> port 0xf020-0xf03f mem
> 0xfb300000-0xfb31ffff,0xfb339000-0xfb339fff at device 25.0 numa-domain 0 on
> pci1 em0: attach_pre capping queues at 1 em0: using 1024 tx descriptors and
> 1024 rx descriptors em0: msix_init qsets capped at 1 em0: Unable to map MSIX
> table  em0: Using an MSI interrupt em0: allocated for 1 tx_queues em0:
> allocated for 1 rx_queues em0: netmap queues/slots: TX 1/1024, RX 1/1024
> [...]  The pciconf output reveals:  em0_at_pci0:0:25:0:        class=0x020000
> card=0x11ed1734 chip=0x153a8086 rev=0x05 hdr=0x00 vendor     = 'Intel
> Corporation'     device     = 'Ethernet Connection I217-LM'     class      =
> network     subclass   = ethernet     bar   [10] = type Memory, range 32,
> base 0xfb300000, size 131072, enabled     bar   [14] = type Memory, range 32,
> base 0xfb339000, size 4096, enabled     bar   [18] = type I/O Port, range 32,
> base 0xf020, size 32, enabled     cap 01[c8] = powerspec 2  supports D0 D3
> current D0     cap 05[d0] = MSI supports 1 message, 64 bit enabled with 1
> message     cap 13[e0] = PCI Advanced Features: FLR TP  I have a customized
> kernel. The NIC has revealed itself all the time as an "emX" device (never as
> igbX). The kernel contains device netmap (if relevevant).  The phenomenon:
> Syncing a poudriere repository between to remote hosts, I use rsync on a
> NGSv4 exported filesystem, mounted via AUTOFS. So far, this work two days ago
> perfectly. Since yesterday, syncing brings down the network connection - the
> connection is simply dead. Terminating the rsync, bringing em0 down and up
> again doesn't help much, for short moments, the connection is established,
> but dies within seconds. Restarting via "service netif restart" all network
> services have the same effect: after the desaster, it is impossible for me to
> bring back the NIC/connection to normal, I have to reboot. The same happens
> when having heavy network load, but it takes a time and even rsync isn't
> "deadly" within the same timeframe - it takes sometimes a couple of seconds,
> another takes only one or two seconds to make the connection die.   I checked
> with dd'ing a large file over that connection, it takes several seconds then
> to make the connection freezing (so, someone could reproduce iy not
> ncessarily using rsync).  Kind regards,  oh
> _______________________________________________ freebsd-current_at_freebsd.org
> mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To
> unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org" 
> 
>     
>     
> 
Received on Wed Jan 11 2017 - 11:46:55 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:09 UTC