Re: recent bge(4) changes causing problems

From: Pyun YongHyeon <pyunyh_at_gmail.com>
Date: Mon, 11 Oct 2010 16:16:04 -0700
On Mon, Oct 11, 2010 at 03:53:31PM -0700, Steve Kargl wrote:
> It seems recent changes to the bge driver are causing
> some problems with my hardware where the watchdog is
> now timing out.
> 
> /var/log/messages contains
> 
> 14:23:14 kernel: SMP: AP CPU #1 Launched!
> 14:23:14 kernel: Trying to mount root from ufs:/dev/ad6s1a
> 14:23:15 kernel: bge1: link state changed to UP
> 14:23:15 lpd[1190]: lpd startup: logging=0
> 14:23:15 ntpd[1224]: ntpd 4.2.4p5-a (1)
> 14:23:15 kernel: bge0: link state changed to UP
> 14:23:24 ntpd[1225]: time reset -0.677316 s
> 14:23:24 ntpd[1225]: kernel time sync status change 2001
> 14:31:01 kernel: bge0: watchdog timeout -- resetting
> 14:31:01 kernel: bge0: link state changed to DOWN
> 14:31:02 kernel: Limiting icmp unreach response from 613 to 200 packets/sec
> 14:31:04 ntpd[1225]: sendto(140.142.2.8) (fd=22): No route to host
> 14:31:04 kernel: bge0: link state changed to UP
> 14:31:30 kernel: Limiting icmp unreach response from 205 to 200 packets/sec
> 14:31:31 kernel: Limiting icmp unreach response from 203 to 200 packets/sec
> 15:40:11 su: kargl to root on /dev/pts/0
> 15:40:35 kernel: bge0: link state changed to DOWN
> 15:40:38 kernel: bge0: link state changed to UP
> 
> The last 2 bge messages are from me manually using 
> ifconfig to bring my net connect back to life.
> 
> troutmask:kargl[206] sysctl -a | grep bge.0
> dev.bge.0.%desc: Broadcom Gigabit Ethernet Controller, ASIC rev. 0x002100
> dev.bge.0.%driver: bge
> dev.bge.0.%location: slot=9 function=0 handle=\_SB_.PCI0.GOLA.GLAN
> dev.bge.0.%pnpinfo: vendor=0x14e4 device=0x1648 subvendor=0x14e4 subdevice=0x1644 class=0x020000
> dev.bge.0.%parent: pci2
> dev.bge.0.forced_collapse: 0
> dev.bge.0.forced_udpcsum: 0
> dev.bge.0.stats.FramesDroppedDueToFilters: 0
> dev.bge.0.stats.DmaWriteQueueFull: 0
> dev.bge.0.stats.DmaWriteHighPriQueueFull: 0
> dev.bge.0.stats.NoMoreRxBDs: 0
> dev.bge.0.stats.InputDiscards: 0
> dev.bge.0.stats.InputErrors: 0
> dev.bge.0.stats.RecvThresholdHit: 325
> dev.bge.0.stats.DmaReadQueueFull: 0
> dev.bge.0.stats.DmaReadHighPriQueueFull: 0
> dev.bge.0.stats.SendDataCompQueueFull: 0
> dev.bge.0.stats.RingSetSendProdIndex: 469
> dev.bge.0.stats.RingStatusUpdate: 330
> dev.bge.0.stats.Interrupts: 330
> dev.bge.0.stats.AvoidedInterrupts: 0
> dev.bge.0.stats.SendThresholdHit: 0
> dev.bge.0.stats.rx.ifHCInOctets: 569452
> dev.bge.0.stats.rx.Fragments: 0
> dev.bge.0.stats.rx.UnicastPkts: 497
> dev.bge.0.stats.rx.MulticastPkts: 1
> dev.bge.0.stats.rx.FCSErrors: 0
> dev.bge.0.stats.rx.AlignmentErrors: 0
> dev.bge.0.stats.rx.xonPauseFramesReceived: 0
> dev.bge.0.stats.rx.xoffPauseFramesReceived: 0
> dev.bge.0.stats.rx.ControlFramesReceived: 0
> dev.bge.0.stats.rx.xoffStateEntered: 0
> dev.bge.0.stats.rx.FramesTooLong: 0
> dev.bge.0.stats.rx.Jabbers: 0
> dev.bge.0.stats.rx.UndersizePkts: 0
> dev.bge.0.stats.rx.inRangeLengthError: 0
> dev.bge.0.stats.rx.outRangeLengthError: 0
> dev.bge.0.stats.tx.ifHCOutOctets: 39056
> dev.bge.0.stats.tx.Collisions: 0
> dev.bge.0.stats.tx.XonSent: 0
> dev.bge.0.stats.tx.XoffSent: 0
> dev.bge.0.stats.tx.flowControlDone: 0
> dev.bge.0.stats.tx.InternalMacTransmitErrors: 0
> dev.bge.0.stats.tx.SingleCollisionFrames: 0
> dev.bge.0.stats.tx.MultipleCollisionFrames: 0
> dev.bge.0.stats.tx.DeferredTransmissions: 0
> dev.bge.0.stats.tx.ExcessiveCollisions: 0
> dev.bge.0.stats.tx.LateCollisions: 0
> dev.bge.0.stats.tx.UnicastPkts: 468
> dev.bge.0.stats.tx.MulticastPkts: 0
> dev.bge.0.stats.tx.BroadcastPkts: 1
> dev.bge.0.stats.tx.CarrierSenseErrors: 0
> dev.bge.0.stats.tx.Discards: 0
> dev.bge.0.stats.tx.Errors: 0
> dev.bge.0.wake: 0
> 
> In the time that it's taken me to compose this message
> the timeout has fire again.
> 
> 15:47:01 kernel: Limiting icmp unreach response from 662 to 200 packets/sec
> 15:47:02 kernel: Limiting icmp unreach response from 446 to 200 packets/sec
> 15:47:03 kernel: Limiting icmp unreach response from 436 to 200 packets/sec
> 15:47:04 kernel: Limiting icmp unreach response from 464 to 200 packets/sec
> 15:47:05 kernel: Limiting icmp unreach response from 438 to 200 packets/sec
> 15:47:06 kernel: Limiting icmp unreach response from 445 to 200 packets/sec
> 15:47:07 kernel: bge0: watchdog timeout -- resetting
> 15:47:07 kernel: bge0: link state changed to DOWN
> 15:47:07 kernel: Limiting icmp unreach response from 439 to 200 packets/sec
> 15:47:08 kernel: Limiting icmp unreach response from 330 to 200 packets/sec
> 15:47:11 kernel: bge0: link state changed to UP
> 15:47:12 kernel: Limiting icmp unreach response from 214 to 200 packets/sec
> 15:47:13 kernel: Limiting icmp unreach response from 202 to 200 packets/sec
> 15:47:14 kernel: Limiting icmp unreach response from 238 to 200 packets/sec
> 15:49:42 kernel: bge0: link state changed to DOWN
> 15:49:44 kernel: bge0: link state changed to UP
> 
> I not seen these icmp unreach response messages.
> 

The icmp unreach has nothing to do with bge(4). Check whether a
server that listens on an UDP port is still alive on your box.
What worries me is bge(4) watchdog timeouts. It looks like your
controller is BCM5704. I also have bge(4) regression report from
marius on sparc64. He said r213945 seemed to cause the issue and
I'm working on the issue. Could you also try the attached patch?

Received on Mon Oct 11 2010 - 21:17:49 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:08 UTC