Re: Regular bge watchdog timeouts on 7.0-PRERELEASE

From: Steve Kargl <sgk_at_troutmask.apl.washington.edu>
Date: Fri, 18 Jan 2008 13:13:27 -0800
On Fri, Jan 18, 2008 at 07:57:04PM +0100, Andre Oppermann wrote:
> Steve Kargl wrote:
> >On Thu, Jan 10, 2008 at 12:00:37PM +0000, Tom Evans wrote:
> >
> >>I am encountering regular watchdog timeouts on bge:
> >>
> >>Jan  9 08:36:11 zoot kernel: bge0: watchdog timeout -- resetting
> >>Jan  9 08:36:11 zoot kernel: bge0: link state changed to DOWN
> >>Jan  9 08:36:13 zoot kernel: bge0: link state changed to UP
> >
> >Add the following to /etc/sysctl.conf
> >
> >net.inet.tcp.sendspace=131072
> >net.inet.tcp.recvspace=131072
> 
> In 7.0 these are automatically tuning and can be left at the default
> settings.

I started using the above before automatic tuning was available,
and I haven't revisited whether these are still needed.  "If it
works, why fix it?" motto.

> >net.inet.tcp.path_mtu_discovery=0
> 
> You should not disable path MTU discovery.  It'll most likely break the
> internet for you when you encounter for example PPPoE links.

This is on a intranet.  A small cluster used for MPI computations.
I won't run into PPPoE issues, but it's good to know that problems
can occur.

> >net.inet.udp.recvspace=65536
> >net.inet.raw.recvspace=16384
> >kern.ipc.nmbclusters=50000
> >kern.ipc.shm_use_phys=1
> >net.inet.tcp.rexmit_min=30
> 
> These changes do not really have much influence on the bge problem
> (at least theoretically).

The first 3 are needed to make NFS happy on my cluster.  The shm
change is needed for MPICH2's nemesis device.  I don't remember
why I set rexmit_min.  See motto above.

-- 
Steve
Received on Fri Jan 18 2008 - 20:13:24 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:26 UTC