Sockets stuck in SYN_RCVD (re(4), RELENG_7, i386)

From: Oliver Fromme <olli_at_lurza.secnetix.de>
Date: Tue, 20 Nov 2007 11:16:25 +0100 (CET)
Hello,

I'm watching a very strange problem here.  There are two
machines with almost identical hardware (see dmesg and
pciconf output at the bottom).  They also run identical
sources: RELENG_7 (i386) as of October-18.  I know it's
a few weeks old, but I haven't seen any changes in the
CVS that might be related to the following problem.

On the first machine, I see a slow, but constant increase
of the number of sockets in state SYN_RCVD in "netstat -an"
output.  The number of those sockets is the same as sysctl
net.inet.tcp.syncache.count.  This does not happen on the
second machine at all (count is zero).

At the moment, the count on the first machine is 702.  We
first noticed it three days ago when the count was 330,
which leads to the assumption that the problem started
about six days ago. However, the machine has an uptime of
32 days. So something must have triggered the problem
after about 26 days of uptime.

The port numbers and remote IPs of the SYN_RCVD sockets
seem to be completely random.  Most of the local ports
are port 25, but a few are also port 80 or port 53.
These are the ports most often used on the machine, all
other ports are blocked in IPFW.  In very rare cases a
socket leaves the SYN_RCVD state.  For example, yesterday
I watched a socket with local destination port 80 that
was in state SYN_RCVD for about 40 minutes and then
disappeared.

Both machines are only very lightly loaded.  In fact they
are pretty much 100% idle most of the time.  They run
sendmail, apache, BIND and a few minor things, but they
really don't do much.

There's nothing in the logs.  Both machines have an re(4)
interface.  However, one interesting difference is that
the first machine runs in GigE mode, while the second,
while the second runs only at 100 Mbps.  I don't know if
the speed changed; the machines are colocated and if have
no idea what kind of switch ports they are connected to.
It could well be that the first machine's port was changed
from 100M to GigE six days ago.  I'm reluctant to change
the speed manually to 100M, because I might lose the link
if the switch is fixed at GigE.  I would have to initiate
a remote reboot in that case.

Another thing worth noting is the fact that the second
machine only has an uptime of 21 days.  I'm curious if
it will start collecting SYN_RCVD sockets when it reaches
26 days, too.  :-)

By the way, the problem does not seem to affect normal
operation, so I'm not too worried at the moment.  I can
connect to the machine's services (ssh, http, smtp, dns)
without any problems.

A few data:

$ sysctl net.inet.tcp.syncache
net.inet.tcp.syncache.rst_on_sock_fail: 1
net.inet.tcp.syncache.rexmtlimit: 3
net.inet.tcp.syncache.hashsize: 512
net.inet.tcp.syncache.count: 702
net.inet.tcp.syncache.cachelimit: 15360
net.inet.tcp.syncache.bucketlimit: 30

$ netstat -s | sed -n '/sync/,/rec/p'
        395637 syncache entries added
                12023 retransmitted
                8719 dupsyn
                0 dropped
                391666 completed
                0 bucket overflow
                0 cache overflow
                1926 reset
                1404 stale
                1 aborted
                0 badack
                21 unreach
                0 zone failures
        395637 cookies sent
        175 cookies received

Output from dmesg and pciconf of the first machine is here:
http://www.secnetix.de/~olli/dmesg/box/7.0-PRE-20071018.dmesg.txt
http://www.secnetix.de/~olli/dmesg/box/7.0-PRE-20071018.pciconf.txt

For comparison, this is the second machine which does _not_
exhibit the problem:
http://www.secnetix.de/~olli/dmesg/pluto/7.0-PRE-20071018.dmesg.txt
http://www.secnetix.de/~olli/dmesg/pluto/7.0-PRE-20071018.pciconf.txt

Please let me know if I should provide more information.
The next thing I would try is to reboot the machine, so
I can see whether the problem occurs immediately or only
after some uptime.

Best regards
   Oliver

-- 
Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M.
Handelsregister: Registergericht Muenchen, HRA 74606,  Geschäftsfuehrung:
secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün-
chen, HRB 125758,  Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart

FreeBSD-Dienstleistungen, -Produkte und mehr:  http://www.secnetix.de/bsd

"Python tricks" is a tough one, cuz the language is so clean. E.g.,
C makes an art of confusing pointers with arrays and strings, which
leads to lotsa neat pointer tricks; APL mistakes everything for an
array, leading to neat one-liners; and Perl confuses everything
period, making each line a joyous adventure <wink>.
        -- Tim Peters
Received on Tue Nov 20 2007 - 09:16:37 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:22 UTC