bizarre nfe(4) problem

From: Don Lewis <truckman_at_FreeBSD.org>
Date: Fri, 10 Aug 2007 22:42:19 -0700 (PDT)
I've a rather strange nfe(4) problem that appears to be repeatable.  I
recently started running -CURRENT on a older socket 754 motherboard with
the nForce3 chipset.  Initially, I was running an SMP kernel, but I had
problems with sporadic "nfe0: watchdog timeout (missed Tx interrupts) --
recovering" problems that would intermittently cause the system to lose
network connectivity which it would recover from.  The kernel was very
similar to GENERIC, with just the addition of "options DEBUG_VFS_LOCKS"
and the replacement of atapicd with atapicam.

The nfe0 problem totally went away when I removed "options SMP" and
"device apic" from the kernel configuration, except under the following
very specific circumstances:

	A vncserver session using the GNOME desktop was started on the
	system.
	
	There was no keyboard or mouse activity on the console for an
	extended period of time, allowing the GNOME screen saver to kick
	in and lock the screen.

The system would run fine in this state for many hours, and would accept
incoming SMTP connections, etc.

	A remote vncclient makes a connection to the vncserver session
	and the password was entired on the client.

At this point the nfe0 interface would appear to go deaf.  This might
happen before or slightly after the password dialog box appeared for the
vnc session.  For a short while, the system would be able to transmit
TCP packets, ntp queries, etc., but it would not respond to any incoming
packets (ping, TCP connection requests, etc.). Eventually, the ARP cache
would time out and the only packets being transmitted would be ARP
requests and the occasional UDP broadcast from the samba server running
on the machine.

Pressing any key on the (PS/2) keyboard would instantly bring the
network interface back to life.  Examination of /var/log/messages showed
lots of "nfe0: watchdog timeout" messages for the entire time that nfe0
was not listening to the network.

I've had this problem happen twice.  Both times were after an extended
period of console inactivity.   An incoming vnc connection is not
sufficient to trigger the problem if the console was recently active,
and even waiting for the GNOME screensaver to put the monitor in DPMS
power save mode before initiating the vnc connection does not appear to
be sufficient to trigger the problem.

I believe that nfe0 was sharing an interrupt with one of the USB ports
when the kernel was compiled with "device apic", but it is not sharing
an interrupt without "device apic".

Any thoughts on how to debug this problem?


# vmstat -i
interrupt                          total       rate
irq0: clk                       41903449       1000
irq1: atkbd0                       39034          0
irq3: ohci0                            5          0
irq7: ppc0                             2          0
irq8: rtc                        5362802        127
irq9: ohci1 ahc0+                1963559         46
irq10: nfe0+                      225593          5
irq11: drm0                      2511908         59
irq12: psm0                       332931          7
irq14: ata0                           48          0
Total                           52339331       1249

Here's the dmesg info:


Copyright (c) 1992-2007 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
	The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 7.0-CURRENT #18: Thu Aug  9 17:35:15 PDT 2007
    dl_at_mousie.catspoiler.org:/usr/obj/usr/src/sys/GENERICDDB
WARNING: WITNESS option enabled, expect reduced performance.
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: AMD Athlon(tm) 64 Processor 3000+ (2009.79-MHz 686-class CPU)
  Origin = "AuthenticAMD"  Id = 0x20fc2  Stepping = 2
  Features=0x78bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2>
  Features2=0x1<SSE3>
  AMD Features=0xe2500800<SYSCALL,NX,MMX+,FFXSR,LM,3DNow!+,3DNow!>
  AMD Features2=0x1<LAHF>
real memory  = 1073479680 (1023 MB)
avail memory = 1037099008 (989 MB)
kbd1 at kbdmux0
ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413)
acpi0: <A M I OEMRSDT> on motherboard
acpi0: [ITHREAD]
acpi0: Power Button (fixed)
acpi0: reservation of 0, a0000 (3) failed
acpi0: reservation of 100000, 3ff00000 (3) failed
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x4008-0x400b on acpi0
cpu0: <ACPI CPU> on acpi0
powernow0: <Cool`n'Quiet K8> on cpu0
device_attach: powernow0 attach returned 6
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
agp0: <NVIDIA nForce3-250 AGP Controller> on hostb0
isab0: <PCI-ISA bridge> at device 1.0 on pci0
isa0: <ISA bus> on isab0
pci0: <serial bus, SMBus> at device 1.1 (no driver attached)
ohci0: <OHCI (generic) USB controller> mem 0xfebfd000-0xfebfdfff irq 3 at device 2.0 on pci0
ohci0: [GIANT-LOCKED]
ohci0: [ITHREAD]
usb0: OHCI version 1.0, legacy support
usb0: <OHCI (generic) USB controller> on ohci0
usb0: USB revision 1.0
uhub0: <nVidia OHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb0
uhub0: 4 ports with 4 removable, self powered
ohci1: <OHCI (generic) USB controller> mem 0xfebfe000-0xfebfefff irq 9 at device 2.1 on pci0
ohci1: [GIANT-LOCKED]
ohci1: [ITHREAD]
usb1: OHCI version 1.0, legacy support
usb1: <OHCI (generic) USB controller> on ohci1
usb1: USB revision 1.0
uhub1: <nVidia OHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb1
uhub1: 4 ports with 4 removable, self powered
pci0: <serial bus, USB> at device 2.2 (no driver attached)
nfe0: <NVIDIA nForce3 MCP7 Networking Adapter> port 0xec00-0xec07 mem 0xfebfc000-0xfebfcfff irq 10 at device 5.0 on pci0
miibus0: <MII bus> on nfe0
e1000phy0: <Marvell 88E1111 Gigabit PHY> PHY 1 on miibus0
e1000phy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX-FDX, auto
nfe0: Ethernet address: 00:15:f2:6a:bf:a6
nfe0: [FILTER]
pci0: <multimedia, audio> at device 6.0 (no driver attached)
atapci0: <nVidia nForce3 Pro UDMA133 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 8.0 on pci0
ata0: <ATA channel 0> on atapci0
ata0: [ITHREAD]
ata1: <ATA channel 1> on atapci0
ata1: [ITHREAD]
atapci1: <nVidia nForce3 Pro SATA150 controller> port 0x9f0-0x9f7,0xbf0-0xbf3,0x970-0x977,0xb70-0xb73,0xc800-0xc80f,0xc400-0xc47f irq 10 at device 10.0 on pci0
atapci1: [ITHREAD]
ata2: <ATA channel 0> on atapci1
ata2: [ITHREAD]
ata3: <ATA channel 1> on atapci1
ata3: [ITHREAD]
pcib1: <ACPI PCI-PCI bridge> at device 11.0 on pci0
pci1: <ACPI PCI bus> on pcib1
vgapci0: <VGA-compatible display> mem 0xea000000-0xebffffff,0xfe9fc000-0xfe9fffff,0xfe000000-0xfe7fffff irq 11 at device 0.0 on pci1
pcib2: <ACPI PCI-PCI bridge> at device 14.0 on pci0
pci2: <ACPI PCI bus> on pcib2
ahc0: <Adaptec 29160N Ultra160 SCSI adapter> port 0xb800-0xb8ff mem 0xfeaff000-0xfeafffff irq 9 at device 10.0 on pci2
ahc0: [ITHREAD]
aic7892: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs
acpi_button0: <Power Button> on acpi0
fdc0: <floppy drive controller (FDE)> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0
fdc0: [FILTER]
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
atkbd0: [ITHREAD]
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: [GIANT-LOCKED]
psm0: [ITHREAD]
psm0: model IntelliMouse Explorer, device ID 4
sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
sio0: type 16550A
sio0: [FILTER]
pmtimer0 on isa0
orm0: <ISA Option ROM> at iomem 0xc0000-0xc8fff pnpid ORM0000 on isa0
ppc0: <Parallel port> at port 0x378-0x37f irq 7 on isa0
ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode
ppc0: FIFO with 16/16/16 bytes threshold
ppbus0: <Parallel port bus> on ppc0
plip0: <PLIP network interface> on ppbus0
lpt0: <Printer> on ppbus0
lpt0: Interrupt-driven port
ppi0: <Parallel I/O> on ppbus0
ppc0: [GIANT-LOCKED]
ppc0: [ITHREAD]
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
sio1: configured irq 3 not in bitmap of probed irqs 0
sio1: port may not be enabled
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
ulpt0: <EPSON USB Printer, class 0/0, rev 1.10/1.00, addr 2> on uhub0
ulpt0: using bi-directional mode
Timecounter "TSC" frequency 2009791960 Hz quality 800
Timecounters tick every 1.000 msec
Waiting 5 seconds for SCSI devices to settle
unknown: FAILURE - INQUIRY ILLEGAL REQUEST asc=0x24 ascq=0x00 
unknown: FAILURE - INQUIRY ILLEGAL REQUEST asc=0x24 ascq=0x00 
sa0 at ahc0 bus 0 target 4 lun 0
sa0: <TANDBERG SLR5 4/8GB =09:> Removable Sequential Access SCSI-2 device 
sa0: 3.300MB/s transfers
sa1 at ahc0 bus 0 target 6 lun 0
sa1: <SONY TSL-11000 L2u3> Removable Sequential Access SCSI-2 device 
sa1: 40.000MB/s transfers (20.000MHz, offset 15, 16bit)
cd0 at ata0 bus 0 target 0 lun 0
cd0: <PLEXTOR DVDR   PX-716A 1.04> Removable CD-ROM SCSI-0 device 
cd0: 3.300MB/s transfers
cd0: Attempt to query device size failed: NOT READY, Medium not present - tray closed
ch0 at ahc0 bus 0 target 6 lun 1
ch0: <SONY TSL-11000 L2u3> Removable Changer SCSI-2 device 
ch0: 40.000MB/s transfers (20.000MHz, offset 15, 16bit)
ch0: 8 slots, 1 drive, 1 picker, 0 portals
da0 at ahc0 bus 0 target 0 lun 0
da0: <SEAGATE ST373207LW 0005> Fixed Direct Access SCSI-3 device 
da0: 160.000MB/s transfers (80.000MHz DT, offset 63, 16bit)
da0: Command Queueing Enabled
da0: 70007MB (143374744 512 byte sectors: 255H 63S/T 8924C)
WARNING: WITNESS option enabled, expect reduced performance.
Trying to mount root from ufs:/dev/da0s1a
nfe0: link state changed to UP
drm0: <Matrox G550 (AGP)> on vgapci0
info: [drm] AGP at 0xf0000000 128MB
info: [drm] Initialized mga 3.2.2 20060319
info: [drm] Initialized card for AGP DMA.
drm0: [ITHREAD]
Received on Sat Aug 11 2007 - 03:42:26 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:16 UTC