Re: What's up with the IP stack?

From: Kevin Oberman <oberman_at_es.net>
Date: Mon, 13 Oct 2003 09:28:38 -0700
> From: Sam Leffler <sam_at_errno.com>
> Date: Sun, 12 Oct 2003 11:56:53 -0700
> Sender: owner-freebsd-current_at_freebsd.org
> 
> On Sunday 12 October 2003 11:03 am, Andre Guibert de Bruet wrote:
> > On Sun, 12 Oct 2003, Josef Karthauser wrote:
> > > On Sun, Oct 12, 2003 at 02:48:01PM +0200, Soren Schmidt wrote:
> > > > It seems Josef Karthauser wrote:
> > > > > I've just built and installed a new kernel, the first since Aug 6th.
> > > > > There appears to be a problem with the IP stack.  What happens is
> > > > > that everything is fine for a few hours, and then the IP stack stops
> > > > > working. I can no longer ping anything on the local network, my
> > > > > default route drops out (which is probably dhclient's doing). 
> > > > > Perhaps it is ARP that is broken, it's hard to tell.  All I know is
> > > > > that I need to reboot to make it work again.
> > > > >
> > > > > Is anyone else experiencing this kind of problem?
> > > >
> > > > Do you have dummynet included in the kernel ?
> > > > That has been broken for me since sam's latest commit as a backout
> > > > of ip_dummynet.c fixes the problem for me...
> > >
> > > No, I've not got dummynet in there.  My current kernel config is:
> >
> > I experienced this a week ago. I found that ifconfig'ing the interface
> > down and back up again "fixed" the problem. I've since reverted to a
> > kernel compiled on September 25th.
> 
> It would be good to know more details; I still don't have much to go on.  Try 
> to identify, for example, if the problem is specific to a particular 
> device/interface or feature you're using (e.g dummynet).  If you have ddb in 
> your system, then when the system gets into a bad state break into the 
> debugger and look for threads that are blocked on locks.  If you have witness 
> in your kernel then show locks would also be useful.  If you don't have 
> witness in your system then rebuild your kernel with it.
> 
> The most recent round of changes were to lock the routing table.  These went 
> in 10/3 and were extensive. They could easily be the problem but w/o more 
> info I can't really help.

Just a few more data points. I am seeing the problem on my ThinkPad
T30 only on the wireless interface. I have never seen it when
connected by 10/100 via fxp0.

When I see this I can reach some LAN hosts, but not others. I can
always seem to reach the access point. I can usually, but not always,
reach most other systems on the LAN, but not the gateway router, a
Sonic Wall firewall. I have logged onto another system and then
connected to the firewall, so it looks like the physical path is OK.

The problem is intermittent and I have only scattered data. I've been
seeing it sice about the beginning of October. I was blaming it on
hardware, but now that I see these reports, maybe it's not. (I just
replaced my Apple Airport AP with a D-Link, so there is something to
suspect.)

In may case things just start working again. The pause can vary from a
few seconds to about 10 minutes. netstat -rnf inet and arp -a output
both look to be fine.
-- 
R. Kevin Oberman, Network Engineer
Energy Sciences Network (ESnet)
Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab)
E-mail: oberman_at_es.net			Phone: +1 510 486-8634
Received on Mon Oct 13 2003 - 07:28:42 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:25 UTC