Re: FreeBSD 5.3 Bridge performance take II

From: Robert Watson <rwatson_at_freebsd.org>
Date: Thu, 9 Sep 2004 02:00:14 -0400 (EDT)
On Wed, 8 Sep 2004, Matthew Dillon wrote:

>     I would recommend against per-thread caches.  Instead, make the per-cpu
>     caches actually *be* per-cpu (that is, not require a mutex).  This is
<big snip>

One of the paragraphs you appear not to have quoted from my e-mail was
this one:

% One nice thing about using this experimental code is that I hope it will
% allow us to reason more effectively about the extent to which improving
% per-cpu data structures improves efficiency -- I can now much more
% easily say "OK, what happens if eliminate the cost of locking for common
% place mbuf allocation/free".  I've also started looking at per-interface
% caches based on the same model, which has some similar limitations (but
% also some similar benefits), such as stuffing per-interface uma caches
% in struct ifnet. 

I.e., using per-thread UMA caches is a 30-60 minute hack that allows me to
explore and measure the performance benefits (and costs) of several
different approaches, including per-cpu, per-thread, and per-data
structure/object caching without doing the full implementation up front. 
Per-thread caching, for example, can simulate the effects of
non-preemption and mutex avoidance in micro-benchmarking, although in the
general case under macro-benchmark perspective it suffers from a number of
problems (including the draining, balancing, and extra storage cost
issues).  I didn't attempt to address these problems under the assumption
that the current implementation is a tool for exploring performance, not
something to actually use.

In doing so, my hope was to identify which areas will offer the most
immediate performance benefits, be it simply cutting down on costly
operations (such as the entropy harvesting code for Yarrow which appears
to have found its way into our interrupt path), rethinking locking
strategies, optimizing out/coalescing locking, optimizing out excess
memory allocation, optimizing synchronization primitives with the same
semantics, changing synchronization assumptions to offer weaker/stronger
semantics, etc.

Right now, though, the greatest obstacle in my immediate path appears to
be a bug in the current version of the if_em driver that causes the
interfaces on my test box to wedge under even moderate load.  The if_em
cards I have on other machines seem not to do this, which suggests a
driver weirdness with this particular version of the chipset/card.  Go
figure...

Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
robert_at_fledge.watson.org      Principal Research Scientist, McAfee Research
Received on Thu Sep 09 2004 - 04:00:49 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:11 UTC