Re: weird network problems on current since 10/28/2012

From: Andreas Tobler <andreast-list_at_fgznet.ch>
Date: Sun, 04 Nov 2012 21:15:26 +0100
On 04.11.12 14:57, Andre Oppermann wrote:
> On 04.11.2012 13:11, Kim Culhan wrote:
>> On Sun, November 4, 2012 6:21 am, Dimitry Andric wrote:
>>> On 2012-11-04 02:13, Manfred Antar wrote:
>>>> At 03:29 PM 11/3/2012, Adrian Chadd wrote:
>>>>> On 3 November 2012 10:40, Manfred Antar <null_at_pozo.com> wrote:
>>>>>> i have problem connecting to freebsd box on local network since last sunday.
>>>>>> the last kernel that works:
>>>>>>    FreeBSD 10.0-CURRENT #0: Sun Oct 28 12:14:38 PDT 2012
>>>>>> anything after that, sometimes i can connect, other times just hangs.
>>>>>> any network connection hangs ===== pop httpd ssh etc etc.
>>>>>> anyone have any ideas ?
>>>>>> i can checkout different sources and see if i can locate the changes that cause
>>>>>> this.
>>>>>
>>>>> Please do!
>>> ...
>>>> Here is what I found doing :
>>>> setenv CVSROOT /usr/home/ncvs
>>>>
>>>> cvs co -D"October 28, 2012 12:14:38 PDT" sys
>>>>
>>>> A kernel from that time works fine.
>>>>
>>>> doing:
>>>>
>>>> cvs up -D"October 28, 2012 13:14:38 PDT" sys                    1 hour later
>>>> the following files were changed:
>>>> sys/netinet/tcp_input.c
>>>> sys/netinet/tcp_timer.c
>>>> sys/netinet/tcp_var.h
>>>>
>>>> Building a kernel from these new files is when the problem starts.
>>>
>>> So, your problems seem to have been introduced by this commit by Andre:
>>>
>>>     http://svn.freebsd.org/changeset/base/242266
>>>
>>>     Increase the initial CWND to 10 segments as defined in IETF TCPM
>>>     draft-ietf-tcpm-initcwnd-05. It explains why the increased initial
>>>     window improves the overall performance of many web services without
>>>     risking congestion collapse.
>>>
>>>     As long as it remains a draft it is placed under a sysctl marking it
>>>     as experimental:
>>>      net.inet.tcp.experimental.initcwnd10 = 1
>>>     When it becomes an official RFC soon the sysctl will be changed to
>>>     the RFC number and moved to net.inet.tcp.
>>>
>>>     This implementation differs from the RFC draft in that it is a bit
>>>     more conservative in the case of packet loss on SYN or SYN|ACK because
>>>     we haven't reduced the default RTO to 1 second yet.  Also the restart
>>>     window isn't yet increased as allowed.  Both will be adjusted with
>>>     upcoming changes.
>>>
>>>     Is is enabled by default.  In Linux it is enabled since kernel 3.0.
>>>
>>> After the commit, there was a small discussion thread on svn-src-head_at_
>>> about the possible problems with the approach.  Maybe you are
>>> experiencing those?
>>>
>>> As the commit message says, you should be able to turn the feature off
>>> using:
>>>
>>>     sysctl net.inet.tcp.experimental.initcwnd10=0
>>>
>>> Can you please try that, and see if the problems go away?
>>
>> FWIW this did not make the problem go away on 2 machines.
> 
> Yes, this very much looks like the same problem as in PR/173309.
> 
> Please try the attached patch.  It fixes the connection hang issue.
> There may be a second issue I debugging currently base on the feedback
> from Fabian Keil.

I jump into this thread since I have a similar network issue.

My scenario:

'make installkernel DESTDIR=/netboot/test' to a nfs mounted drive.
The nfs drive on the server is an ufs fs. No zfs.

Up to r242261 I can install the kernel (or world) in a fluent way to the
nfs destination.

>From r242262 it doesn't work smooth. I have stalls, sometimes my
patience is not enough and I kill the process.

I tried 242266 with the above mentioned patch. No real success.

How can I help/test?

TIA,
Andreas
Received on Sun Nov 04 2012 - 19:41:18 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:31 UTC