On Thursday 05 August 2004 05:56, Robert Watson wrote: > As you know, I'm currently profiling and tracing our inbound and outbound > network stacks, and I've already bothered you with an e-mail about lock > coalescing and avoiding lock thrashing in the Yarrow thread. This is more > of the same sort of thing, and is about per-packet processing costs. > > I've observed in tracing that we grab and release two entropy related spin > mutexes for every ethernet packet processed. We do this in the ithread > before the netisr runs, and it directly introduces latency (and cost) in > the network path. Here's a sample trace fragment from receiving an > ethernet packet: > > 12024 0 1280 ithread_schedule: setrunqueue 27 > 12025 0 1480 UNLOCK (spin mutex) sched lock r = 0 at > ../../../kern/kern_intr.c:414 12026 0 1048 LOCK (spin mutex) > entropy harvest r = 0 at ../../../dev/random/randomdev_soft.c:300 12027 0 > 788 LOCK (spin mutex) entropy harvest buffers r = 0 at > ../../../dev/random/randomdev_soft.c:309 12028 0 856 UNLOCK > (spin mutex) entropy harvest buffers r = 0 at > ../../../dev/random/randomdev_soft.c:317 12029 0 616 UNLOCK > (spin mutex) entropy harvest r = 0 at > ../../../dev/random/randomdev_soft.c:338 > > On inspecting random_harvest_internal(), it seems to be the case upfront > that we can do an unlocked read of harvestfifo[origin].count to compare > with RANDOM_FIFO_MAX and avoid any locks in the event that the event fifo > is full. Obviously, you'd need to retest after acquiring the locks in the > event there would appear to be room, but assuming that the fifo will often > be full under load, this would save a useful amount of cost. I haven't > attempted to measure how often the fifo fills, however, so can't currently > reason about whether that will save work in the common case. > > Another observation is that we seem to be doing a lot of entropy > gathering. That is to say -- a lot. On a busy system, I have to wonder > whether we're not paying a high cost to gather more entropy than we really > need. I'm not familiar with the Yarrow implementation nor harvesting > bits, but I'd pose this question to you: right now, we appear to pay four > mutex operations per packet if the fifo isn't full. Can we rate limit > entropy gathering in entropy-rich systems to avoid doing so much work? If > we're processing 25,000 or 100,000 packets a second, that's a lot of goup > passing through Yarrow. Is it possible to do lockless rate limiting so > that we gather it only once every few seconds? This might make a big > aggregate difference when processing ethernet packets at a high rate, such > as in bridging/forwarding scenarios, etc. Stupid question: Why do we try to make sure that *entropy* is passed reliable? i.e. wouldn't it be enough to store it (unlocked) "somewhere" inside a circle buffer and read from it (unlocked) to turn it into randomness. The potential race just gives some extra entropy. But as I started, might be a stupid question. -- /"\ Best regards, | mlaier_at_freebsd.org \ / Max Laier | ICQ #67774661 X http://pf4freebsd.love2party.net/ | mlaier_at_EFnet / \ ASCII Ribbon Campaign | Against HTML Mail and News
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:04 UTC