Re: 9.0-RC1 panic in tcp_input: negative winow.

From: Lawrence Stewart <lstewart_at_freebsd.org>
Date: Fri, 28 Oct 2011 11:29:34 +1100
On 10/26/11 22:53, John Baldwin wrote:
> On Wednesday, October 26, 2011 3:54:31 am Pawel Jakub Dawidek wrote:
>> On Mon, Oct 24, 2011 at 08:14:22AM -0400, John Baldwin wrote:
>>> On Sunday, October 23, 2011 11:58:28 am Pawel Jakub Dawidek wrote:
>>>> On Sun, Oct 23, 2011 at 11:44:45AM +0300, Kostik Belousov wrote:
>>>>> On Sun, Oct 23, 2011 at 08:10:38AM +0200, Pawel Jakub Dawidek wrote:
>>>>>> My suggestion would be that if we won't be able to fix it before 9.0,
>>>>>> we should turn this assertion off, as the system seems to be able to
>>>>>> recover.
>>>>>
>>>>> Shipped kernels have all assertions turned off.
>>>>
>>>> Yes, I'm aware of that, but many people compile their production kernels
>>>> with INVARIANTS/INVARIANT_SUPPORT to fail early instead of eg.
>>>> corrupting data. I'd be fine in moving this under DIAGNOSTIC or changing
>>>> it into a printf, so it will be visible.
>>>
>>> No, the kernel is corrupting things in other places when this is true, so
>>> if you are running with INVARIANTS, we want to know about it.   Specifically,
>>> in several places in TCP we assume that rcv_adv>= rcv_nxt, and depend on
>>> being able to do 'rcv_adv - rcv_nxt'.
>>>
>>> In this case, it looks like the difference is consistently less than one
>>> frame.  I suspect the other end of the connection is sending just beyond the
>>> end of the advertised window (it probably assumes it is better to send a full
>>> frame if it has that much pending data even though part of it is beyond the
>>> window edge vs sending a truncated packet that just fills the window) and that
>>> that frame is accepted ok in the header prediction case and it's ACK is
>>> delayed, but the next packet to arrive then trips over this assumption.
>>>
>>> Since 'win' is guaranteed to be non-negative and we explicitly cast
>>> 'rcv_adv - rcv_nxt' to (int) in the following line that the assert is checking
>>> for:
>>>
>>> 	tp->rcv_wnd = imax(win, (int)(tp->rcv_adv - tp->rcv_nxt));
>>>
>>> I think we already handle this case ok and perhaps the assertion can just be
>>> removed?  Not sure if others feel that it warrants a comment to note that this
>>> is the case being handled.
>>
>> I added debug to the places where rcv_adv and rcv_nxt are modified. Here
>> is what happens before the panic occurs:
>>
>> tcp_do_segment:1722 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022361548 rcv_adv 4022360100 diff -1448
>> tcp_do_segment:2847 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022362298 rcv_adv 4022361548 diff -750
>> tcp_do_segment:1722 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022363746 rcv_adv 4022362298 diff -1448
>> tcp_do_segment:2847 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022364836 rcv_adv 4022363746 diff -1090
>> tcp_do_segment:1722 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022366284 rcv_adv 4022364836 diff -1448
>> tcp_do_segment:1722 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022370628 rcv_adv 4022369690 diff -938
>> tcp_do_segment:1722 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022379140 rcv_adv 4022377692 diff -1448
>> tcp_do_segment:1722 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022387792 rcv_adv 4022386344 diff -1448
>> tcp_do_segment:2847 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022388890 rcv_adv 4022387792 diff -1098
>> tcp_do_segment:1722 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022390338 rcv_adv 4022388890 diff -1448
>> tcp_do_segment:2847 negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022394563 rcv_adv 4022394342 diff -221
>> panic: tcp_input negative window: tp 0xfffffe000dab1b70 rcv_nxt 4022394563 rcv_adv 4022394342 win=0 diff -221
>>
>> I can send you the full log if you want, I've plenty of messages where
>> rcv_adv<  rcv_nxt, not all of them trigger this assertion.
>
> The assertion would be triggered when the next packet arrives (as I said
> above).  Try modifying your debugging output to also log if the ACK is
> delayed.  I suspect it is not delayed until the last one.  (Pushing out an
> ACK will reset rcv_adv to be beyond rcv_nxt in tcp_output(), so in the case
> of an immediate ACK, rcv_nxt>  rcv_adv is only a transient condition all
> under a single lock invocation so never visible to other consumers of the
> protocol control block.)  If that is what you see, then that confirms what
> I guessed above and I will likely just remove the assertion in tcp_input()
> and patch the timewait code to handle this case.
>

Pawel, have you been able to confirm John's hypothesis? What I don't 
quite get is why we haven't had a lot more reports of this issue...

Cheers,
Lawrence
Received on Thu Oct 27 2011 - 22:29:37 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:19 UTC