Re: ixgbe and fast interrupts

From: John Baldwin <jhb_at_freebsd.org> Date: Tue, 22 Nov 2011 08:43:20 -0500 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:20 UTC

On Monday, November 21, 2011 12:36:15 pm Luigi Rizzo wrote:
> On Mon, Nov 21, 2011 at 11:29:29AM -0500, John Baldwin wrote:
> > On Friday, November 18, 2011 5:04:58 pm Luigi Rizzo wrote:
> > > On Fri, Nov 18, 2011 at 11:16:00AM -0800, Doug Barton wrote:
> > > > On 11/18/2011 09:54, Luigi Rizzo wrote:
> > > > > One more thing (i am mentioning it here for archival purposes,
> > > > > as i keep forgetting to test it). Is entropy harvesting expensive ?
> > > > 
> > > > No. It was designed to be inexpensive on purpose. :)
> > > 
> > > hmmm....
> > > unfortunately I don't have a chance to test it until monday
> > > (probably one could see if the ping times change by modifying
> > > the value of kern.random.sys.harvest.* ).
> > > 
> > > But in the code i see the following:
> > > 
> > > - the harvest routine is this:
> > > 
> > >     void
> > >     random_harvest(void *entropy, u_int count, u_int bits, u_int frac,
> > > 	enum esource origin)
> > >     {
> > >         if (reap_func)
> > >                 (*reap_func)(get_cyclecount(), entropy, count, bits, frac,
> > >                     origin);
> > >     }
> > > 
> > > - the reap_func seems to be bound to
> > > 
> > >     dev/random/randomdev_soft.c::random_harvest_internal()
> > > 
> > >   which internally uses a spinlock and then moves entries between
> > >   two lists.
> > > 
> > > I am concerned that the get_cyclecount() might end up querying an
> > > expensive device (is it using kern.timecounter.hardware ?)
> > 
> > On modern x86 it just does rdtsc().
> > 
> > > So between the indirect function call, spinlock, list manipulation
> > > and the cyclecounter i wouldn't be surprised it the whole thing
> > > takes a microsecond or so.
> > 
> > I suspect it is not quite that expensive.
> > 
> > > Anyways, on monday i'll know better. in the meantime, if someone
> > > wants to give it a try... in our tests between two machines and
> > > ixgbe (10G) interfaces, an unmodified 9.0 kernel has a median ping
> > > time of 30us with "slow" pings (say -i 0.01 or larger) and 17us with
> > > a ping -f .
> > 
> > Did you time it with harvest.interrupt disabled?
> 
> yes, thanks for reminding me to post the results.
> 
> Using unmodified ping (which has 1us resolution on the reports),
> there is no measurable difference irrespective
> of the setting of kern.random.sys.harvest.ethernet,
> kern.random.sys.harvest.interrupt and kern.timecounter.hardware.
> Have tried to set hw mitigation to 0 on the NIC (ixgbe on both
> sides) but there is no visible effect either.

I had forgotten that kern.random.sys.harvest.interrupt only matters if the
interrupt handlers pass the INTR_ENTROPY flag to bus_setup_intr().  I
suspect your drivers probably aren't doing that anyway.

> However I don't trust my measurements because i cannot explain them.
> Response times have a min of 20us (about 50 out of 5000 samples)
> and a median of 27us, and i really don't understand if the low
> readings are real or the result of some races.

Hmm, 7 us does seem a bit much for a spread.

> Ping does a gettimeofday() for the initial timestamp, and relies
> on in-kernel timestamp for the response.

Hmm, gettimeofday() isn't super cheap.  What I do for measuring RTT is to
use an optimized echo server (not the one in inetd) on the remote host and
reflect packets off of that.  The sender/receiver puts a TSC timestamp into
the packet payload and computes a TSC delta when it receives the reflected
response.  I then run ministat over the TSC deltas to get RTT in TSC counts
and use machdep.tsc_freq of the sending machine to convert the TSC delta
values to microseconds.

-- 
John Baldwin