Re: Interrupt storm detection

From: Bruce Evans <bde_at_zeta.org.au>
Date: Fri, 11 Jun 2004 22:37:35 +1000 (EST)
On Fri, 11 Jun 2004, Ian FREISLICH wrote:

> I have a problem printing.  The data rate through my parallel port
> to my printer makes the kernel think that lpt0 is storming at between
> 40k-49k irqs per second.  Is there a way to tell the kernel to
> ignore certain interrupt sources or to raise the per-second throttle
> value?  I've only found hw.intr_storm_threshold which I assume is
> the number of interrupts from a source before an interrupt arives
> from another source.  I've set this to 2000 to make the printing
> work, but now I'm not sure if this will protect from a real interrupt
> storm.

The throttle value hw.intr_storm_threshold is actually the limit on the
number of interrupts from a source that arrive as fast as they can be
handled.  If interrupts arrive faster than they can be handled, then
there really is a storm.

The DELAY(1) that I added to interrupt handling may have broken things
for devices that interrupt too much like lpt0 :-(.  DELAY(1) takes
quite a bit longer than 1 usec (more like 5 usec).  It looks like lpt0
takes 15-20 usec per interrupt and when 5 usec is added to this the
machine is transiently overloaded and doesn nothing except handle lpt0
interrupts until it complete a write, taking 20-25 usec each.  A
slightly slower machine might be overloaded even without the DELAY(1).
The DELAY(1) is only done for the first (10 * interrupt_storm_threshold)
interrupts each device (default 10 * 500).  It looks like your value
of 2000 works because the first few writes to lpt0 are less than 20000
but larger than 5000 bytes (1 byte per interrupt for this slow device),
so that the system can become non-overloaded before the threahold is
reached.

I also changed interrupt handling so that storm prevention is fairly
sticky, since the DELAY(1) that is needed for initial detection is too
costly to use all the time.  If this is working right, then initial
misdetection of storming interrupts and contributions to misdetection
by the DELAY(1) should be recovered from eventually.  For lpt0, initial
misdetection would cause lpt0 interrupts to be throttled to about 1/HZ
hz.  It may take a long while to print files at this rate, but eventually
you will run out of things to print (or give up :-).  Then the throttle
should be removed.  However, if output to lpt0 can transiently overload
the machine, then the overload is indistinguishable from an interrupt
storm and the problem will recur.  I think it is possible for output
to lpt0 to transiently overload the machine -- it just takes a printer
has more than interrupt_storm_threshold bytes of buffering and can ack
each character that it receives as fast as the interrupt handler can
deliver them.  (Old driver timing bugs are also relevant here.  The
lpt interrupt handler has few clues about interrupt timing.  It waits
(for possibly too long) on entry but doesn't wait or even check for
another interrupt to arrive after it sends a character to the printer.
Thus getting another interrupt as soon as it returns is the usual case
if the printer hardware does the right things.)

The interrupt storm threshold really needs to be per-interrupt source,
but your value of 2000 is probably good enough here.  Real interrupt
storms repeat endlessly, so almost any nonzero threshold can detect
them.  The threshold just shouldn't be very large, so that the storms
can be detected soon after their device is attached.

Bruce
Received on Fri Jun 11 2004 - 10:38:07 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:56 UTC