Quoting "O. Hartmann" <ohartmann_at_walstatt.org> (from Fri, 17 Mar 2017 12:20:18 +0100): > Since the introduction of the IFLIB changes, I realise severe problems on > CURRENT. I already reported something like this to sbruno_at_ and M. Macy (in copy). > Running the most recent CURRENT (FreeBSD 12.0-CURRENT #27 r315442: Fri Mar 17 > 10:46:04 CET 2017 amd64), the problems on a workstation got severe > within the > past two days: > > since a couple of weeks the em0 NIC (Intel i217-LM, see below) dies on heavy > I/O. I realised this first when "rsync"ing poudriere repositories to a remote > NFSv4 (automounted) folder. The em0 device could be revived by > ifconfig down/up > procedure. > But not the i217-LM chip is affected. On another box equipted with a > i350 dual > port GBit NIC I observed a similar behaviour under (artificially) > high I/O load > (but I didn't investigate that further since it occured very seldom). It's not only those chipsets. It may be beneficial if you could provide the pciconf output for those devices. Mine is: ---snip--- em0_at_pci0:2:6:0: class=0x020000 card=0x13768086 chip=0x107c8086 rev=0x05 hdr=0x00 vendor = 'Intel Corporation' device = '82541PI Gigabit Ethernet Controller' ---snip--- > Now, since around yesterday, the i217-LM dies without being reviveable with > ifconfig down/up: Doing so, my FreeBSD CURRENT machine (Fujitsu Celsius M740) I don't know if for the chip I see this issue with a simple down/up would help (it's a headless server in a remote datacenter). For the moment I'm using the workaround of something like "ping -C 1 <gateway> || shutdown -r now" in crontab. The system in question is at r314137. > remains with a dead em0 device, reporting "no route" in some occasions but > stuck in the dead state. Every attempt to establish manually the route again > fails, only rebooting the box gives some relief. > > On the console, I have some very strange reports: > > - ping reports suddenly about no buffer space > - or I see sometimes massive occurences of "em0: TX(0) desc avail = > 1024, pidx > = 0" on the console I don't see this in messages or console log, but I see that ntpd can't resolve hostnames in the logs. > Either way, sending/receiving large files on an established network GBit line > which could be saturated by approx 100 MBytes/s tend to make the NIC fail. I can report that the "svnlite update" on the box of of the FreeBSD src tree is able to trigger the issue in my case. I have to add that before the iflib changes I've seen frequent em-watchdog timeouts in the logs / dmesg. So for me we have two issues here: - the hardware wasn't 100% supported before the iflib changes (it seems) - the iflib changes have lost some watchdog functionality / auto-failure-recovery feature Bye, Alexander. -- http://www.Leidinger.net Alexander_at_Leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.org netchild_at_FreeBSD.org : PGP 0x8F31830F9F2772BF
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:10 UTC