Re: [net] protecting interfaces from races between control and data ?

From: Luigi Rizzo <rizzo_at_iet.unipi.it> Date: Wed, 7 Aug 2013 09:18:00 +0200 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:40 UTC

On Wed, Aug 7, 2013 at 5:26 AM, Mike Karels <mike_at_karels.net> wrote:

> I'm replying to one of the last messages of this thread, but in part going
> back to the beginning; then I'm following up on Andre's proposal.
>
> Luigi wrote:
> > i am slightly unclear of what mechanisms we use to prevent races
> > between interface being reconfigured (up/down/multicast setting, etc,
> > all causing reinitialization of the rx and tx rings) and
>
> > i) packets from the host stack being sent out;
> > ii) interrupts from the network card being processed.
>
> > I think in the old times IFF_DRV_RUNNING was used for this purpose,
> > but now it is not enough.
> > Acquiring the "core lock" in the NIC does not seem enough, either,
> > because newer drivers, especially multiqueue ones, have per-queue
> > rx and tx locks.
>
> > Does anyone know if there is a generic mechanism, or each driver
> > reimplements its own way ?
>
> I'm not sure I understand the question, or its motivation.  What problem(s)
> are we trying to solve here?  It seems to me that this is mostly internal
> to drivers, and I don't see the issue with races.  In particular, the only
> external guarantees that I see are that control operations will affect the
> packet stream "soon" but at some undefined place.  Not all of the cited
> operations (e.g. multicast changes) need to cause the rings to be
> reinitialized; if they do, that's a chip or driver flaw.  Clearing the UP
> flag should cause packets to stop arriving "soon", but presumably
> processing
> those already in memory; packets to stop being sent "soon", probably
> including
> some already accepted for transmission; and new attempts to transmit
> receiving
> an error "soon".  And, of course, the driver should not crash or misbehave.
> Other than that, I don't see what external guarantees need to be met.
>

i only want 'driver should not crash or misbehave', which is something that
(I believe) many current drivers do not guarantee because of the races
mentioned
in the thread (control path reinitializes rings without waiting for the
datapath to drain).
    My specific problem was achieving this safe behaviour when moving
between
netmap mode and regular mode; i hoped i could replicate whatever scheme
was implemented by the drivers in 'normal' mode, and this is when i
realized that
there was no such protection in place.

Jumping to (near) the end of the thread, I like most of Andre's proposal.
> Running with minimal locks at this layer is an admirable goal, and I agree
> with most of what was said.  I have a few observations on the general
> changes,
> or related issues:
>
> There was mention of taskqueues.  I think that with MSI-X, taskqueues
> should not be needed or used.  More specifically, having separate ithreads
> and taskqueues, with ithreads deferring to taskqueues after some limit,
> makes
> sense only for MSI and legacy interrupts.  With MSI-X, an interrupt thread
> should be able to process packets indefinitely with sufficient CPU
> resources,
> and there is no reason to context switch to a different thread
> periodically.
> A periodic "yield" might be reasonable, but if it is necessary, small
> packet
> performance will suffer.  However, most of this is internal to the driver.
>

i am not completely clear on what is the difference between ithreads and
taskqueues.

Also, Andre's proposal requires to force-kill the ithread, but i am unclear
on how to do it
safely (i.e. without leaving the data structures in some inconsistent
state), unless ithread
periodically yields the CPU when it is in a safe state. While this is
internal to the driver,
we should probably provide some template code to avoid that each driver
implements
its own way to shutdown the ithread.

cheers
luigi