Re: [PATCH] SO_REUSEADDR and SO_REUSEPORT behaviour

From: Sepherosa Ziehau <sepherosa_at_gmail.com>
Date: Mon, 2 Dec 2013 19:36:45 +0800
On Mon, Dec 2, 2013 at 12:29 PM, Oleg Moskalenko <mom040267_at_gmail.com>wrote:

> Sepherosa, while reading your description I noticed another long-standing
> problem for UDP application developers: the UDP sockets are always hashed
> with 2-tuple. But UDP sockets can be "connected", too, to a remote address,
> with connect(...)
>

The connected UDP sockets will be in connect hash, which is hashed using
faddr/laddr/fport/lport.  SO_REUSEPORT only affects wildcard sockets.


> function. Unfortunately, with 2-tuple hashing, that pattern is useless for
> large-scale applications: if a large number of UDP sockets on the same
> local port are "connected" to remote address, then the kernel have to go
> thru the long list of UDP sockets with the same hash value.
>
> If the connected UDP sockets would use 4-tuples, then it would be very
> helpful for the new generation of the UDP-based media applications. For
> example, servers which use DTLS protocol would become simpler and more
> efficient.
>
>
If you are talking about RSS, then igb, ixgbe and mxge (and may be other
drivers) support RSS extension (mxge is not using RSS, but still 4-tuple
hash), which will include UDP fport/lport into Toeplitz hash calculation.
Well, for fragments of a UDP datagram, if the ports are taken into
consideration the RSS hash will be different for leading fragment and rest
of the fragments; I think that's why MS didn't include ports for UDP.

Best Regards,
sephe


> Thanks
> Oleg
>
>
>
> On Sun, Dec 1, 2013 at 8:17 PM, Sepherosa Ziehau <sepherosa_at_gmail.com>wrote:
>
>>
>>
>>
>> On Sat, Nov 30, 2013 at 2:42 AM, Ermal Luçi <eri_at_freebsd.org> wrote:
>>
>>> Well seems Dragonfly has some version of it already from commit [1].
>>>
>>>
>> The distribution algorithm was changed a little bit after initial commit
>> to gain more idle time (bnx(4) output has already been maxed out):
>>
>> http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/c275f18d832361be28b150d3f4fd518914bdeba6
>>
>> Well, I also addressed a reasonable concern from nginx folks (I am not
>> quite sure about Linux's position on it; Linux original implementation of
>> SO_REUSEPORT from Google had this drawback, which I mentioned in the commit
>> message):
>>
>> http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/02ad2f0b874fb0a45eb69750219f79f5e8982272
>>
>> As about nginx, SO_REUSEPORT patch for nginx (both 1.4.x and 1.5.x) is in
>> dports; should be easier to be back ported to FreeBSD's ports.  I failed to
>> convince nginx folks to merge it into mainline and I am currently onto
>> other stuffs, will come back to them later.  If FreeBSD is going to
>> implement Linux's style of SO_REUSEPORT, pushing the patch to the nginx
>> mainline will be easier.
>>
>> I also put up a brief description of SO_REUSEPORT in dfly; may be useful
>> to you:
>> http://leaf.dragonflybsd.org/~sephe/netisr_so_reuseport.txt
>>
>> Best Regards,
>> sephe
>>
>>
>>>  In FreeBSD there is the framework for this with by defining PCBGROUP.
>>> Also the explanation of it at [2] and [3].
>>> It can achieve approximately the same features of SO_RESUSEPORT of linux.
>>> The only thing missing is the marketing behind it and i think and better
>>> RSS support.
>>> By looking at dates the support is there before linux so all you guys
>>> looking for it can experiment with it.
>>>
>>> What i was trying to accomplish was something else from performance
>>> improvement and
>>> maybe put a sysctl behind it to make it more acceptable..
>>>
>>> [1]
>>>
>>> http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/740d1d9f7b7bf9c9c021abb8197718d7a2d441c9
>>> [2]
>>> http://fxr.watson.org/fxr/source/netinet/in_pcbgroup.c?im=bigexcerpts#L51
>>> [3]
>>> http://lists.freebsd.org/pipermail/svn-src-head/2011-June/028190.html
>>>
>>>
>>> On Fri, Nov 29, 2013 at 7:03 PM, Oleg Moskalenko <mom040267_at_gmail.com
>>> >wrote:
>>>
>>> > Tim, you are wrong. Read what is "multicast" definition, and read how
>>> UDP
>>> > and TCP sockets work in Linux 3.9+ kernels.
>>> >
>>> > Oleg .
>>> >
>>> >
>>> > On Fri, Nov 29, 2013 at 9:59 AM, Tim Kientzle <kientzle_at_freebsd.org
>>> >wrote:
>>> >
>>> >>
>>> >> On Nov 29, 2013, at 4:04 AM, Ermal Luçi <eri_at_freebsd.org> wrote:
>>> >>
>>> >> > Hello,
>>> >> >
>>> >> > since SO_REUSEADDR and SO_REUSEPORT are supposed to allow two
>>> daemons to
>>> >> > share the same port and possibly listening ip …
>>> >>
>>> >> These flags are used with TCP-based servers.
>>> >>
>>> >> I’ve used them to make software upgrades go more smoothly.
>>> >> Without them, the following often happens:
>>> >>
>>> >> * Old server stops.  In the process, all of its TCP connections are
>>> >> closed.
>>> >>
>>> >> * Connections to old server remain in the TCP connection table until
>>> the
>>> >> remote end can acknowledge.
>>> >>
>>> >> * New server starts.
>>> >>
>>> >> * New server tries to open port but fails because that port is “still
>>> in
>>> >> use” by connections in the TCP connection table.
>>> >>
>>> >> With these flags, the new server can open the port even though
>>> >> it is “still in use” by existing connections.
>>> >>
>>> >>
>>> >> > This is not the case today.
>>> >> > Only multicast sockets seem to have the behaviour of broadcasting
>>> the
>>> >> data
>>> >> > to all sockets sharing the same properties through these options!
>>> >>
>>> >> That is what multicast is for.
>>> >>
>>> >> If you want the same data sent to all listeners, then
>>> >> that is multicast behavior and you should be using
>>> >> a multicast socket.
>>> >>
>>> >> > The patch at [1] implements/corrects the behaviour for UDP sockets.
>>> >>
>>> >> You’re trying to turn all UDP sockets with those options
>>> >> into multicast sockets.
>>> >>
>>> >> If you want a multicast socket, you should ask for one.
>>> >>
>>> >> Tim
>>> >>
>>> >> _______________________________________________
>>> >> freebsd-net_at_freebsd.org mailing list
>>> >> http://lists.freebsd.org/mailman/listinfo/freebsd-net
>>> >> To unsubscribe, send any mail to "freebsd-net-unsubscribe_at_freebsd.org
>>> "
>>> >>
>>> >
>>> >
>>>
>>>
>>> --
>>> Ermal
>>> _______________________________________________
>>> freebsd-current_at_freebsd.org mailing list
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-current
>>> To unsubscribe, send any mail to "
>>> freebsd-current-unsubscribe_at_freebsd.org"
>>>
>>
>>
>>
>> --
>> Tomorrow Will Never Die
>>
>
>


-- 
Tomorrow Will Never Die
Received on Mon Dec 02 2013 - 10:36:48 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:44 UTC