pst driver: timeout explosion? (patch is attached)

From: Aaron Smith <aaron_at_mutex.org>
Date: Sun, 7 Sep 2003 19:51:22 -0700
Hi,

I think I may have found the cause of the pst timeout panics.  I'm using
the Promise SX6000 RAID on -CURRENT, using the pst driver.  Unfortunately,
under sufficiently high I/O load, the box starts printing:

  "pst: timeout mfa=0x00327b90 cmd=0x01"

The 'mfa' address varies. It starts printing more and more rapidly, and
then eventually the machine wedges solid. Sometimes it makes it to:

  "panic: timeout table full"

Here's what I think is happening. Two timeouts are being scheduled every
time a timeout triggers, because pst_timeout schedules a timeout before
calling pst_rw to retry the operation. Then pst_rw schedules ANOTHER
timeout.  Both of these timeouts call pst_timeout, so they double every 10
seconds until there are a large enough number of timeouts firing, retrying
the same I/O operation, that the table fills and the machine panics.

Check out the following diff

  http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/pst/pst-raid.c.diff?r1=1.8&r2=1.9&f=h

This is where pst_rw was changed to schedule its own timeouts, but the
timeout function didn't have its removed.

Do you think this could be the correct explanation? It seems like once
pst_timeout is called, the machine is doomed... I'm recompiling my kernel
now to test the fix under load.

--Aaron

Received on Sun Sep 07 2003 - 17:51:25 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:21 UTC