I want to put these mbuf changes/updates/adjustments up for objections, if any, before committing them. This is a moderate overhaul of the mbuf headers and fields to take us into the next 5 years and two releases. The mbuf headers, in particular the pkthdr, have seen a number of new uses and abuses over the years. Some other uses have fallen by the wayside in the same time. The goal of the changes presented here is to better accommodate additional upcoming offload features and to allow full backporting from HEAD to the announced 10-stable branch while preserving the API and ABI compatibility. The individual changes and their rationale are described below. It is presented as one big patch to show the big picture. For any commits it will be broken into functional units as usual. Except for two limited changes the API in current HEAD remains stable with only a recompile necessary. Improved alignment and overall size of mbuf headers: The m_hdr, pkthdr and m_ext structures are adjusted to improve alignment and packing on 32 and 64 bit architectures. The mbuf structures have grown/changed considerably and currently are at 88/144 bytes (32/64bit) leaving less space for data and more importantly exceeding two 64 byte cache lines on typical CPU's. The latter being relevant when m_ext is accessed. m_hdr is compacted from 24/40 to 20/32 bytes by packing the type and flags fields into one uint32. The type is an enum with only a handful of types in use and thus reduced from int to only 8 bits allowing for 255 types to be specified. The most we ever had was around a dozen. Since then it has shrunk to only 5 for a long time. The flags field gets the remaining 24 bits with 12 bits for global persistent flags, of which 9 (possibly 10) are in use, and 12 bits for protocol/layer specific overlays. Out of the global flags some could be moved to csum/offload bits in the pkthdr. No further growth in the number of global flags is foreseen as new uses either are layer or protocol specific or belong to offload capabilities which have their own flags. pkthdr size stays the same at 48/56 but changes a number of fields to adapt to predominant current and future uses. In particular the "header" field has only little use and is moved into a 64bit protocol/layer specific union for local use. Primary users are IP reassembly, IGMP/MLD and ATM storing information while the packet is being worked on. "header" was never used across layers. "csum_flags" is extended to 64 bits to allow additional future offload information to be carried (for example IPsec offload and others). Definition of the RSS hash type is moved from the hackish global m_flags to its own 8 bit enum in the pkthdr. An addition is cosqos to store Class of Service / Quality of Service information with the packet. Depending on the transport mechanism it may get reduced in width during encapsulation (vlan header). These capabilities are currently not supported in any drivers but allow us to get on par with Cisco/Juniper in routing applications (plus MPLS QoS). Four 8 bit fields l[2-5]hlen are added to store the relative and cumulative header offsets from the start of the packet. This is important for various offload capabilities and to relieve the drivers from having to parse the packet headers to find out or verify the header location for checksums. Parsing in drivers is a lot of copy-paste and unhandled corner cases which we want to avoid. The surrounding infrastructure in the stack and drivers is part of a current FreeBSD Foundation grant under progress. Another flexible 64 bit union serves to map various additional persistent packet information, like ether_vtag, tso_segsz and csum fields. Depending on the csum_flags settings some fields may have different usage making it very flexible and adaptable to future capabilities. m_ext is compacted from 28/56 to 28/48 simply be rearranging the field ordering to allow for better packing. Again the type is an enum with only a few values but used to have a full int to waste. It is split into a 8 bit type and 24 bit flags. With more special uses in high performance network interfaces and more specialized external memory attached to mbufs it makes sense to add a specific flags field. It can for example convey information about externally managed reference counts without having to invent a ext_type each time and having special casing it. The biggest change is an argument extension to the *ext_free function pointer adding a pointer to the mbuf itself. It was always a bit painful not having direct access to the mbuf we're freeing the external storage from. One could use one of the args for it but that would be a waste. All uses in the tree are mechanically adjusted. - void (*ext_free)(void *, void *, void *); + void (*ext_free)(struct mbuf *, void *, void *); The header portion of struct mbuf thus changes from 88/144 to 96/136. The last 8 bytes to push it down to 128 are only reachable with intrusive changes, like removing the second argument from m_ext. CSUM flags: The current CSUM flags are a bit chaotic and rather poorly document, especially that their use on the outbound (down the stack) and inbound (up the stack) use is rather different. Especially the latter are handled partially incorrect in almost all drivers. To bring clarity into this mess the CSUM flags are named and arranged more appropriately with compatibility mappings. The drivers then can be corrected one by one as the work progresses in the new 11-HEAD and MFCd without issue to then 10-stable. The l[3-5]hlen fields provide the means to remove all packet header parsing from the drivers for offload setup. Others: Mbuf initialization is unified through m_init() and m_pkthdr_init() to avoid duplication. m_free_fast() is removed for lack of usage. Patch is available here: http://people.freebsd.org/~andre/mbuf-adjustments-20130821.diff This work is sponsored by the FreeBSD Foundation. -- Andre
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:40 UTC