Further mbuf adjustments and changes

From: Andre Oppermann <andre_at_freebsd.org>
Date: Wed, 21 Aug 2013 15:40:53 +0200
I want to put these mbuf changes/updates/adjustments up for objections, if any,
before committing them.

This is a moderate overhaul of the mbuf headers and fields to take us into the
next 5 years and two releases.  The mbuf headers, in particular the pkthdr, have
seen a number of new uses and abuses over the years.  Some other uses have fallen
by the wayside in the same time.

The goal of the changes presented here is to better accommodate additional upcoming
offload features and to allow full backporting from HEAD to the announced 10-stable
branch while preserving the API and ABI compatibility.

The individual changes and their rationale are described below.  It is presented
as one big patch to show the big picture.  For any commits it will be broken into
functional units as usual.  Except for two limited changes the API in current HEAD
remains stable with only a recompile necessary.

Improved alignment and overall size of mbuf headers:

The m_hdr, pkthdr and m_ext structures are adjusted to improve alignment and packing
on 32 and 64 bit architectures. The mbuf structures have grown/changed considerably
and currently are at 88/144 bytes (32/64bit) leaving less space for data and more
importantly exceeding two 64 byte cache lines on typical CPU's.  The latter being
relevant when m_ext is accessed.

m_hdr is compacted from 24/40 to 20/32 bytes by packing the type and flags fields
into one uint32.  The type is an enum with only a handful of types in use and thus
reduced from int to only 8 bits allowing for 255 types to be specified.  The most
we ever had was around a dozen.  Since then it has shrunk to only 5 for a long time.
The flags field gets the remaining 24 bits with 12 bits for global persistent flags,
of which 9 (possibly 10) are in use, and 12 bits for protocol/layer specific overlays.
Out of the global flags some could be moved to csum/offload bits in the pkthdr.  No
further growth in the number of global flags is foreseen as new uses either are layer
or protocol specific or belong to offload capabilities which have their own flags.

pkthdr size stays the same at 48/56 but changes a number of fields to adapt to
predominant current and future uses.  In particular the "header" field has only
little use and is moved into a 64bit protocol/layer specific union for local use.
Primary users are IP reassembly, IGMP/MLD and ATM storing information while the
packet is being worked on.  "header" was never used across layers.  "csum_flags"
is extended to 64 bits to allow additional future offload information to be
carried (for example IPsec offload and others).  Definition of the RSS hash type
is moved from the hackish global m_flags to its own 8 bit enum in the pkthdr.
An addition is cosqos to store Class of Service / Quality of Service information
with the packet.  Depending on the transport mechanism it may get reduced in
width during encapsulation (vlan header).  These capabilities are currently not
supported in any drivers but allow us to get on par with Cisco/Juniper in routing
applications (plus MPLS QoS).  Four 8 bit fields l[2-5]hlen are added to store
the relative and cumulative header offsets from the start of the packet.  This is
important for various offload capabilities and to relieve the drivers from having
to parse the packet headers to find out or verify the header location for checksums.
Parsing in drivers is a lot of copy-paste and unhandled corner cases which we want
to avoid.  The surrounding infrastructure in the stack and drivers is part of a
current FreeBSD Foundation grant under progress.  Another flexible 64 bit union
serves to map various additional persistent packet information, like ether_vtag,
tso_segsz and csum fields.  Depending on the csum_flags settings some fields may
have different usage making it very flexible and adaptable to future capabilities.

m_ext is compacted from 28/56 to 28/48 simply be rearranging the field ordering
to allow for better packing.  Again the type is an enum with only a few values but
used to have a full int to waste.  It is split into a 8 bit type and 24 bit flags.
With more special uses in high performance network interfaces and more specialized
external memory attached to mbufs it makes sense to add a specific flags field.
It can for example convey information about externally managed reference counts
without having to invent a ext_type each time and having special casing it.
The biggest change is an argument extension to the *ext_free function pointer adding
a pointer to the mbuf itself.  It was always a bit painful not having direct access
to the mbuf we're freeing the external storage from.  One could use one of the args
for it but that would be a waste.  All uses in the tree are mechanically adjusted.
- void (*ext_free)(void *, void *, void *);
+ void (*ext_free)(struct mbuf *, void *, void *);

The header portion of struct mbuf thus changes from 88/144 to 96/136.  The last
8 bytes to push it down to 128 are only reachable with intrusive changes, like
removing the second argument from m_ext.

CSUM flags:

The current CSUM flags are a bit chaotic and rather poorly document, especially
that their use on the outbound (down the stack) and inbound (up the stack) use
is rather different.  Especially the latter are handled partially incorrect in
almost all drivers.  To bring clarity into this mess the CSUM flags are named
and arranged more appropriately with compatibility mappings.  The drivers then
can be corrected one by one as the work progresses in the new 11-HEAD and MFCd
without issue to then 10-stable.  The l[3-5]hlen fields provide the means to
remove all packet header parsing from the drivers for offload setup.

Others:

Mbuf initialization is unified through m_init() and m_pkthdr_init() to avoid
duplication.  m_free_fast() is removed for lack of usage.

Patch is available here:

  http://people.freebsd.org/~andre/mbuf-adjustments-20130821.diff

This work is sponsored by the FreeBSD Foundation.

-- 
Andre

Received on Wed Aug 21 2013 - 11:41:05 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:40 UTC