On 07/08/14 10:46, Hans Petter Selasky wrote: > Hi, > > I'm working on a new feature which will allow TCP connections to be > timing controlled by the ethernet hardware driver, actually the mlxen > driver. The main missing piece in the kernel is to allow the mbuf's > flowid value to be overwritten in "struct inpcb" once the connection is > established and to have a callback once the TCP connection is gone so > that the assigned "flowid" can be freed by the ethernet hardware driver. > > The "flowid" will be used to assign the outgoing data traffic of a > specific TCP connections to a hardware controlled queue, which in > advance contain certain parameters about the timing for the transmitted > packets. > > To be able to set the flowid I'm using existing functions in the kernel > TCP code to lookup the "inpcb" structure based on the 4-tuple, via the > "ifp->if_ioctl()" callback of the network adapter. I'm also registering > a function method table so that I get a callback when the TCP connection > is gone. > > A this point of development I would like to get some feedback from > FreeBSD network guys about my attached patch proposal. > > The motivation for this work is to have a more reliable TCP > transmissions typically for fixed-rate media content going some > distance. To illustrate this I will give you an example from the world > of VoIP, which is using UDP. When doing long-distance VoIP calls through > various unknown networks and routers it makes a very big difference if > you are sending data 20ms apart or 40ms apart, even at the exact same > rate. In the one case you might experience a bunch of packet drops, and > in the other case, everything is fine. Why? Because the number of > packets you send per second, and the timing is important. The goal is to > apply some timing rules for TCP, to increase the factor of successful > transmission, and to reduce the amount of data loss. For high throughput > applications we want to do this by means of hardware. > > > While at it I would like to "typedef" the flowid used by mbufs, "struct > inpcb" and many more places. Where would the right place be to put such > a definition? In "sys/mbuf.h"? > > > Comments are appreciated! I think we need to design this to be as generic as possible. I have quite a bit of code that does this stuff but I haven't pushed it upstream or even offered it for review (yet). cxgbe(4) hardware does throttling and traffic pacing too, but it's not limited to TCP, and it can do it per queue or per "flow" -- you can limit a tx queue or an individual flow to a packet-per-second limit or a bandwidth ceiling; this works for both plain NIC (TCP, UDP, whatever), as well as stateful TCP offload). For TCP (NIC or TOE) the chip can even rewrite the TCP timestamp to account for the extra time that the chip/driver held the packet because it was asked to slow down a flow. The per queue stuff is handled via a driver-specific tool (cxgbetool). For per-flow throttling my implementation adds a new sockopt (SO_TX_THROTTLE) that lets an application specify a throttle rate for a socket. The kernel allocates a "flow identifier" for each such socket and tcp_output (or udp_output, ..) will attach an mbuf tag containing this identifier and throttling parameters to each mbuf that it pushes out. Drivers for hardware that can throttle traffic look for this tag, the rest ignore it. - cxgbe(4) registers itself as a "flow throttling provider" with the kernel when it attaches to the chip. It tells the kernel how many flows it can handle and the range of rates it can handle. - setsockopt(SO_TX_THROTTLE, rate) makes the kernel allocate a unique identifier for the socket. This is *not* related to the RSS flowid at all. If a listening socket has SO_TX_THROTTLE, all its children will inherit the rate limiting parameters but will each get its own unique identifier. The setsockopt fails if there aren't any flow throttling providers registered, - tcp_output (and other proto_output) routines look for SO_TX_THROTTLE and attach extra metadata, in the form of a tag, to the outgoing frames. - cxgbe(4) reads this metadata and acts on it. Regards, NavdeepReceived on Tue Jul 08 2014 - 17:17:05 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:50 UTC