Re: 9.0-RC1 panic in tcp_input: negative winow.

From: John Baldwin <jhb_at_freebsd.org>
Date: Mon, 24 Oct 2011 08:14:22 -0400
On Sunday, October 23, 2011 11:58:28 am Pawel Jakub Dawidek wrote:
> On Sun, Oct 23, 2011 at 11:44:45AM +0300, Kostik Belousov wrote:
> > On Sun, Oct 23, 2011 at 08:10:38AM +0200, Pawel Jakub Dawidek wrote:
> > > My suggestion would be that if we won't be able to fix it before 9.0,
> > > we should turn this assertion off, as the system seems to be able to
> > > recover.
> > 
> > Shipped kernels have all assertions turned off.
> 
> Yes, I'm aware of that, but many people compile their production kernels
> with INVARIANTS/INVARIANT_SUPPORT to fail early instead of eg.
> corrupting data. I'd be fine in moving this under DIAGNOSTIC or changing
> it into a printf, so it will be visible.

No, the kernel is corrupting things in other places when this is true, so
if you are running with INVARIANTS, we want to know about it.   Specifically,
in several places in TCP we assume that rcv_adv >= rcv_nxt, and depend on
being able to do 'rcv_adv - rcv_nxt'.

In this case, it looks like the difference is consistently less than one 
frame.  I suspect the other end of the connection is sending just beyond the 
end of the advertised window (it probably assumes it is better to send a full 
frame if it has that much pending data even though part of it is beyond the 
window edge vs sending a truncated packet that just fills the window) and that
that frame is accepted ok in the header prediction case and it's ACK is 
delayed, but the next packet to arrive then trips over this assumption.

Since 'win' is guaranteed to be non-negative and we explicitly cast
'rcv_adv - rcv_nxt' to (int) in the following line that the assert is checking
for:

	tp->rcv_wnd = imax(win, (int)(tp->rcv_adv - tp->rcv_nxt));

I think we already handle this case ok and perhaps the assertion can just be
removed?  Not sure if others feel that it warrants a comment to note that this
is the case being handled.

Also, I'm not sure if this case can "leak" into the timewait code?  If so, we 
will need to fix this case:

	/*
	 * Recover last window size sent.
	 */
	KASSERT(SEQ_GEQ(tp->rcv_adv, tp->rcv_nxt),
	    ("tcp_twstart negative window: tp %p rcv_nxt %u rcv_adv %u", tp,
	    tp->rcv_nxt, tp->rcv_adv));
	tw->last_win = (tp->rcv_adv - tp->rcv_nxt) >> tp->rcv_scale;

So that it sets last_win to 0 instead of some really huge value.

-- 
John Baldwin
Received on Mon Oct 24 2011 - 10:27:54 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:19 UTC