So much entropy it's coming out of our ears?

From: Robert Watson <rwatson_at_FreeBSD.org> Date: Wed, 4 Aug 2004 23:56:19 -0400 (EDT) · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:04 UTC

As you know, I'm currently profiling and tracing our inbound and outbound
network stacks, and I've already bothered you with an e-mail about lock
coalescing and avoiding lock thrashing in the Yarrow thread.  This is more
of the same sort of thing, and is about per-packet processing costs.

I've observed in tracing that we grab and release two entropy related spin
mutexes for every ethernet packet processed.  We do this in the ithread
before the netisr runs, and it directly introduces latency (and cost) in
the network path.  Here's a sample trace fragment from receiving an
ethernet packet:

 12024   0             1280 ithread_schedule: setrunqueue 27
 12025   0             1480 UNLOCK (spin mutex) sched lock r = 0 at ../../../kern/kern_intr.c:414
 12026   0             1048 LOCK (spin mutex) entropy harvest r = 0 at ../../../dev/random/randomdev_soft.c:300
 12027   0              788 LOCK (spin mutex) entropy harvest buffers r = 0 at ../../../dev/random/randomdev_soft.c:309
 12028   0              856 UNLOCK (spin mutex) entropy harvest buffers r = 0 at ../../../dev/random/randomdev_soft.c:317
 12029   0              616 UNLOCK (spin mutex) entropy harvest r = 0 at ../../../dev/random/randomdev_soft.c:338

On inspecting random_harvest_internal(), it seems to be the case upfront
that we can do an unlocked read of harvestfifo[origin].count to compare
with RANDOM_FIFO_MAX and avoid any locks in the event that the event fifo
is full.  Obviously, you'd need to retest after acquiring the locks in the
event there would appear to be room, but assuming that the fifo will often
be full under load, this would save a useful amount of cost.  I haven't
attempted to measure how often the fifo fills, however, so can't currently
reason about whether that will save work in the common case.

Another observation is that we seem to be doing a lot of entropy
gathering.  That is to say -- a lot.  On a busy system, I have to wonder
whether we're not paying a high cost to gather more entropy than we really
need.  I'm not familiar with the Yarrow implementation nor harvesting
bits, but I'd pose this question to you: right now, we appear to pay four
mutex operations per packet if the fifo isn't full.  Can we rate limit
entropy gathering in entropy-rich systems to avoid doing so much work?  If
we're processing 25,000 or 100,000 packets a second, that's a lot of goup
passing through Yarrow.  Is it possible to do lockless rate limiting so
that we gather it only once every few seconds?  This might make a big
aggregate difference when processing ethernet packets at a high rate, such
as in bridging/forwarding scenarios, etc.

Thanks!

Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
robert_at_fledge.watson.org      Principal Research Scientist, McAfee Research