On Sun, 2004-12-05 at 14:08, Robert Watson wrote: > On Sun, 5 Dec 2004, Sean McNeil wrote: > > > > It replaces non-atomic maintenance of the counters with atomic > > > maintenance. However, this adds measurably to the cost of allocation, so > > > I've been reluctant to commit it. The counters maintained by UMA are > > > likely sufficient to generate the desired mbuf output now that we have > > > mbuma, but I haven't had an opportunity to walk through the details of it. > > > I hope to do so once I get closer to merging patches to use critical > > > sections to protect UMA per-cpu caches, since I need to redo parts of the > > > sysctl code then anyway. You might want to give this patch, or one much > > > like it, a spin to confirm that the race is the one I think it is. The > > > race in updating mbuf allocator statistics is one I hope to get fixed > > > prior to 5.4. > > > > Since they appear to not be required for actual system use (by the fact > > that it being negative doesn't cause problems), could the counts be > > computed for display instead? > > This is pretty much what UMA does with its per-CPU caches. It pulls and > pushes statistics from the caches in a couple of situations: > > - When pulling a new bucket into or out of the cache, it has to acquire > the zone mutex, so also pushes statistics. > - When a timer fires every few seconds, all the caches are checked to > update the global zone statistics. > - When the sysctl runs, it replicates the logic in the timer code to also > update the zone statistics for display. > > And you can already extract pretty much all of the interesting allocation > information for mbufs from vmstat -z as the mbufs are now stored using > UMA. In the critical section protected version of the code, I haven't yet > decided if the timers should run per-cpu, and/or how the sysctl should > coalesce the information for display. I hope to have much of this > resolved shortly. My current leaning is that a small amount of localized > and temporary inconsistency in the stats isn't a problem, so simply doing > a set of lockless reads across the per-cpu caches to update stats for > presentation should be fine, and that we can probably drop the timer > updates of statistics since the cache bucket balancing keeps things pretty > in sync. > > I haven't committed the move to critical sections yet as it's currently a > performance pessimization for the UP case, as entering a critical section > on UP is more expensive than acquiring a mutex. John Baldwin has patches > that remedy this, but hasn't yet merged them (there's also an instability > with them I've seen). I know that Stephen Uphoff has also been > investigating this issue. I am wondering if critical sections are overkill for this application since the interrupt blocking capability is not needed. Perhaps a mutex_lock_local, mutex_unlock_local that work on per CPU mutexes and require the current thread to be pinned to a CPU would be more appropriate. The functions would have the same functionality as mutex_lock/mutex_unlock but without using atomic operations. The "local" mutex would be a clone of the UP sleep mutex and the SMP sleep mutex and the local mutex could even be used together in any order. (With working priority inheritance) I think I will try this in my p4 mutex branch in the next days. A "do not preempt" section (Interrupt enabled) would be another possibility. StephanReceived on Mon Dec 06 2004 - 02:45:31 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:23 UTC