As you know, I'm currently profiling and tracing our inbound and outbound network stacks, and I've already bothered you with an e-mail about lock coalescing and avoiding lock thrashing in the Yarrow thread. This is more of the same sort of thing, and is about per-packet processing costs. I've observed in tracing that we grab and release two entropy related spin mutexes for every ethernet packet processed. We do this in the ithread before the netisr runs, and it directly introduces latency (and cost) in the network path. Here's a sample trace fragment from receiving an ethernet packet: 12024 0 1280 ithread_schedule: setrunqueue 27 12025 0 1480 UNLOCK (spin mutex) sched lock r = 0 at ../../../kern/kern_intr.c:414 12026 0 1048 LOCK (spin mutex) entropy harvest r = 0 at ../../../dev/random/randomdev_soft.c:300 12027 0 788 LOCK (spin mutex) entropy harvest buffers r = 0 at ../../../dev/random/randomdev_soft.c:309 12028 0 856 UNLOCK (spin mutex) entropy harvest buffers r = 0 at ../../../dev/random/randomdev_soft.c:317 12029 0 616 UNLOCK (spin mutex) entropy harvest r = 0 at ../../../dev/random/randomdev_soft.c:338 On inspecting random_harvest_internal(), it seems to be the case upfront that we can do an unlocked read of harvestfifo[origin].count to compare with RANDOM_FIFO_MAX and avoid any locks in the event that the event fifo is full. Obviously, you'd need to retest after acquiring the locks in the event there would appear to be room, but assuming that the fifo will often be full under load, this would save a useful amount of cost. I haven't attempted to measure how often the fifo fills, however, so can't currently reason about whether that will save work in the common case. Another observation is that we seem to be doing a lot of entropy gathering. That is to say -- a lot. On a busy system, I have to wonder whether we're not paying a high cost to gather more entropy than we really need. I'm not familiar with the Yarrow implementation nor harvesting bits, but I'd pose this question to you: right now, we appear to pay four mutex operations per packet if the fifo isn't full. Can we rate limit entropy gathering in entropy-rich systems to avoid doing so much work? If we're processing 25,000 or 100,000 packets a second, that's a lot of goup passing through Yarrow. Is it possible to do lockless rate limiting so that we gather it only once every few seconds? This might make a big aggregate difference when processing ethernet packets at a high rate, such as in bridging/forwarding scenarios, etc. Thanks! Robert N M Watson FreeBSD Core Team, TrustedBSD Projects robert_at_fledge.watson.org Principal Research Scientist, McAfee ResearchReceived on Thu Aug 05 2004 - 01:57:37 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:04 UTC