Re: Some performance measurements on the FreeBSD network stack

From: Luigi Rizzo <rizzo_at_iet.unipi.it> Date: Fri, 20 Apr 2012 08:35:30 +0200 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:26 UTC

On Fri, Apr 20, 2012 at 12:37:21AM +0200, Andre Oppermann wrote:
> On 20.04.2012 00:03, Luigi Rizzo wrote:
> >On Thu, Apr 19, 2012 at 11:20:00PM +0200, Andre Oppermann wrote:
> >>On 19.04.2012 22:46, Luigi Rizzo wrote:
> >>>The allocation happens while the code has already an exclusive
> >>>lock on so->snd_buf so a pool of fresh buffers could be attached
> >>>there.
> >>
> >>Ah, there it is not necessary to hold the snd_buf lock while
> >>doing the allocate+copyin.  With soreceive_stream() (which is
> >
> >it is not held in the tx path either -- but there is a short section
> >before m_uiotombuf() which does
> >
> >	...
> >	SOCKBUF_LOCK(&so->so_snd);
> >	// check for pending errors, sbspace, so_state
> >	SOCKBUF_UNLOCK(&so->so_snd);
> >	...
> >
> >(some of this is slightly dubious, but that's another story)
> 
> Indeed the lock isn't held across the m_uiotombuf().  You're talking
> about filling an sockbuf mbuf cache while holding the lock?

all i am thinking is that when we have a serialization point we
could use it for multiple related purposes. In this case yes we
could keep a small mbuf cache attached to so_snd. When the cache
is empty either get a new batch (say 10-20 bufs) from the zone
allocator, possibly dropping and regaining the lock if the so_snd
must be a leaf.  Besides for protocols like TCP (does it use the
same path ?) the mbufs are already there (released by incoming acks)
in the steady state, so it is not even necessary to to refill the
cache.

This said, i am not 100% sure that the 100ns I am seeing are all
spent in the zone allocator.  As i said the chain of indirect calls
and other ops is rather long on both acquire and release.

> >>>But the other consideration is that one could defer the mbuf allocation
> >>>to a later time when the packet is actually built (or anyways
> >>>right before the thread returns).
> >>>What i envision (and this would fit nicely with netmap) is the following:
> >>>- have a (possibly readonly) template for the headers (MAC+IP+UDP)
> >>>   attached to the socket, built on demand, and cached and managed
> >>>   with similar invalidation rules as used by fastforward;
> >>
> >>That would require to cross-pointer the rtentry and whatnot again.
> >
> >i was planning to keep a copy, not a reference. If the copy becomes
> >temporarily stale, no big deal, as long as you can detect it reasonably
> >quiclky -- routes are not guaranteed to be correct, anyways.
> 
> Be wary of disappearing interface pointers...

(this reminds me, what prevents a route grabbed from the flowtable
from disappearing and releasing the ifp reference ?)

In any case, it seems better to keep a more persistent ifp reference
in the socket rather than grab and release one on every single
packet transmission.

> >>>- possibly extend the pru_send interface so one can pass down the uio
> >>>   instead of the mbuf;
> >>>- make an opportunistic buffer allocation in some place downstream,
> >>>   where the code already has an x-lock on some resource (could be
> >>>   the snd_buf, the interface, ...) so the allocation comes for free.
> >>
> >>ETOOCOMPLEXOVERTIME.
> >
> >maybe. But i want to investigate this.
> 
> I fail see what passing down the uio would gain you.  The snd_buf lock
> isn't obtained again after the copyin.  Not that I want to prevent you
> from investigating other ways. ;)

maybe it can open the way to other optimizations, such as reducing
the number of places where you need to lock, or save some data
copies, or reduce fragmentation, etc.

cheers
luigi