Re: tcp over slow links broken?

From: Bakul Shah <bakul_at_bitblocks.com>
Date: Sun, 11 May 2008 23:56:25 -0700
On Sun, 11 May 2008 12:07:34 PDT Matthew Dillon <dillon_at_apollo.backplane.com>  wrote:
>     Hmm.  It looks like C has gone deaf, not B.  B is retransmitting from
>     sequence 4744 which is the last sequence that C acked.  C is then not
>     acking any further packets.

Yes indeed.

> 14:22:42.411144 IP B.55535 > C.ssh: . 7664:9124(1460) ack 2016 win 65535
> 14:22:42.411259 IP B.55535 > C.ssh: . 9124:10584(1460) ack 2016 win 65535
> 14:22:42.468350 IP C.ssh > B.55535: . ack 4744 win 65535
> 14:22:42.490556 IP C.ssh > B.55535: . ack 4744 win 65535
> 14:22:42.830171 IP B.55535 > C.ssh: . 4744:6204(1460) ack 2016 win 65535
	...
> 
>     This sounds like a packet filter state issue.  My guess is that
>     PF running on B is getting confused.  Either PF is getting confused,
>     or the packet is getting munged somehow to the point where PF refuses
>     to bridge it.

I had already tried this.

>     The A->C path (the one that is working) is going through PF's NAT rules.
>     The B->C path is probably going through a different set of PF rules.
> 
>     I suggest capturing a trace on C to see if C is actually receiving 
>     B's retransmissions.

Finally this evening thanks to my friend Rob Warnock's help
this got narrowed down quite a bit.  We captured a trace on C
and saw that it was not seeing the [4744:6204) data range
packet or any of its retransmits.  But this was a perfectly
valid packet on B (verified with tcpdump -v + manual header
checksumming).  Then Rob recalled having run across mbuf
alignment issues in the past so to check for that I swapped
NICs around and the problem stayed with the NIC, an old DEC
21140 card!

So this was not related to pf or a slow link but most likely
due to mbuf misalignment (IIRC de requires aligned mbufs).
There is just one commit on if_de.c during past April.
Perhaps this is due to a side effect of that (bpf is not
given a packet during device attach) or perhaps some change
elsewhere.

Thanks for your & Julian's help!
Received on Mon May 12 2008 - 04:56:27 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:30 UTC