On Wed, 8 Sep 2004, Matthew Dillon wrote: > I would recommend against per-thread caches. Instead, make the per-cpu > caches actually *be* per-cpu (that is, not require a mutex). This is <big snip> One of the paragraphs you appear not to have quoted from my e-mail was this one: % One nice thing about using this experimental code is that I hope it will % allow us to reason more effectively about the extent to which improving % per-cpu data structures improves efficiency -- I can now much more % easily say "OK, what happens if eliminate the cost of locking for common % place mbuf allocation/free". I've also started looking at per-interface % caches based on the same model, which has some similar limitations (but % also some similar benefits), such as stuffing per-interface uma caches % in struct ifnet. I.e., using per-thread UMA caches is a 30-60 minute hack that allows me to explore and measure the performance benefits (and costs) of several different approaches, including per-cpu, per-thread, and per-data structure/object caching without doing the full implementation up front. Per-thread caching, for example, can simulate the effects of non-preemption and mutex avoidance in micro-benchmarking, although in the general case under macro-benchmark perspective it suffers from a number of problems (including the draining, balancing, and extra storage cost issues). I didn't attempt to address these problems under the assumption that the current implementation is a tool for exploring performance, not something to actually use. In doing so, my hope was to identify which areas will offer the most immediate performance benefits, be it simply cutting down on costly operations (such as the entropy harvesting code for Yarrow which appears to have found its way into our interrupt path), rethinking locking strategies, optimizing out/coalescing locking, optimizing out excess memory allocation, optimizing synchronization primitives with the same semantics, changing synchronization assumptions to offer weaker/stronger semantics, etc. Right now, though, the greatest obstacle in my immediate path appears to be a bug in the current version of the if_em driver that causes the interfaces on my test box to wedge under even moderate load. The if_em cards I have on other machines seem not to do this, which suggests a driver weirdness with this particular version of the chipset/card. Go figure... Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert_at_fledge.watson.org Principal Research Scientist, McAfee ResearchReceived on Thu Sep 09 2004 - 04:00:49 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:11 UTC