Re: Sockets stuck in SYN_RCVD (re(4), RELENG_7, i386)

From: Oliver Fromme <olli_at_lurza.secnetix.de> Date: Mon, 26 Nov 2007 20:37:08 +0100 (CET) · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:23 UTC

Hello,

Now I have an additional piece of information for this bug.

Today I noticed that the second system -- which did not
exhibit the problem so far -- also started collecting
sockets in the SYN_RCVD state ("netstat -n").

Extrapoliting the current count and growth rate, it must
have started on Saturday.  The machine then had an uptime
of 25 days -- about the same uptime as the first machine
when it started to show this problem.

Whatever triggers the bug, it seems to be uptime-related.
Both machines are running with HZ=1000 (the default).
A signed int variable running at HZ speed would overflow
after 2^31 seconds which happens to be 24.9 days ...

So it seems this is what's happening:  Somewhere in the
kernel (probably the TCP syncache code) there's a piece
of code using uptime information in HZ resolution for
timing purposes or whatever.  However, it uses a signed
int, maybe just for intermediate results, causing an
overflow after 2^31/HZ seconds, which leads to wrong
results and finally hanging sockets in the SYN_RCVD
state.

Could anyone familiar help me trying to locate the bug
in the code?  I'm pretty sure that my analysis isn't far
from the truth.  I'm also pretty sure that a type cast
at the right place will fix it.  The problem is to find
the right place.

Best regards
   Oliver

-- 
Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M.
Handelsregister: Registergericht Muenchen, HRA 74606,  Geschäftsfuehrung:
secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün-
chen, HRB 125758,  Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart

FreeBSD-Dienstleistungen, -Produkte und mehr:  http://www.secnetix.de/bsd

"C is quirky, flawed, and an enormous success."
        -- Dennis M. Ritchie.