Re: mbuf count negative

From: Robert Watson <rwatson_at_freebsd.org> Date: Sun, 5 Dec 2004 19:08:35 +0000 (GMT) · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:23 UTC

On Sun, 5 Dec 2004, Sean McNeil wrote:

> > It replaces non-atomic maintenance of the counters with atomic
> > maintenance.  However, this adds measurably to the cost of allocation, so
> > I've been reluctant to commit it.  The counters maintained by UMA are
> > likely sufficient to generate the desired mbuf output now that we have
> > mbuma, but I haven't had an opportunity to walk through the details of it. 
> > I hope to do so once I get closer to merging patches to use critical
> > sections to protect UMA per-cpu caches, since I need to redo parts of the
> > sysctl code then anyway.  You might want to give this patch, or one much
> > like it, a spin to confirm that the race is the one I think it is.  The
> > race in updating mbuf allocator statistics is one I hope to get fixed
> > prior to 5.4.
> 
> Since they appear to not be required for actual system use (by the fact
> that it being negative doesn't cause problems), could the counts be
> computed for display instead? 

This is pretty much what UMA does with its per-CPU caches.  It pulls and
pushes statistics from the caches in a couple of situations:

- When pulling a new bucket into or out of the cache, it has to acquire
  the zone mutex, so also pushes statistics.
- When a timer fires every few seconds, all the caches are checked to
  update the global zone statistics.
- When the sysctl runs, it replicates the logic in the timer code to also
  update the zone statistics for display.

And you can already extract pretty much all of the interesting allocation
information for mbufs from vmstat -z as the mbufs are now stored using
UMA.  In the critical section protected version of the code, I haven't yet
decided if the timers should run per-cpu, and/or how the sysctl should
coalesce the information for display.  I hope to have much of this
resolved shortly.  My current leaning is that a small amount of localized
and temporary inconsistency in the stats isn't a problem, so simply doing
a set of lockless reads across the per-cpu caches to update stats for
presentation should be fine, and that we can probably drop the timer
updates of statistics since the cache bucket balancing keeps things pretty
in sync. 

I haven't committed the move to critical sections yet as it's currently a
performance pessimization for the UP case, as entering a critical section
on UP is more expensive than acquiring a mutex.  John Baldwin has patches
that remedy this, but hasn't yet merged them (there's also an instability
with them I've seen).  I know that Stephen Uphoff has also been
investigating this issue. 

Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
robert_at_fledge.watson.org      Principal Research Scientist, McAfee Research