Re: pst driver: timeout explosion? (patch is attached)

From: Soren Schmidt <sos_at_spider.deepcore.dk>
Date: Mon, 8 Sep 2003 08:26:59 +0200 (CEST)
It seems Aaron Smith wrote:
> Hi,
> 
> I think I may have found the cause of the pst timeout panics.  I'm using
> the Promise SX6000 RAID on -CURRENT, using the pst driver.  Unfortunately,
> under sufficiently high I/O load, the box starts printing:
> 
>   "pst: timeout mfa=0x00327b90 cmd=0x01"
> 
> The 'mfa' address varies. It starts printing more and more rapidly, and
> then eventually the machine wedges solid. Sometimes it makes it to:
> 
>   "panic: timeout table full"
> 
> Here's what I think is happening. Two timeouts are being scheduled every
> time a timeout triggers, because pst_timeout schedules a timeout before
> calling pst_rw to retry the operation. Then pst_rw schedules ANOTHER
> timeout.  Both of these timeouts call pst_timeout, so they double every 10
> seconds until there are a large enough number of timeouts firing, retrying
> the same I/O operation, that the table fills and the machine panics.
> 
> Check out the following diff
> 
>   http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/pst/pst-raid.c.diff?r1=1.8&r2=1.9&f=h
> 
> This is where pst_rw was changed to schedule its own timeouts, but the
> timeout function didn't have its removed.
> 
> Do you think this could be the correct explanation? It seems like once
> pst_timeout is called, the machine is doomed... I'm recompiling my kernel
> now to test the fix under load.

Yes, correct, there is a double timeout call in case of a timeout.
This explains why it goes down burning, but it still does explain
why we get the first timeout which I've been hunting for ages.

I'll commit the fix right away for the double timeout call, thanks!!

-Søren
Received on Sun Sep 07 2003 - 21:27:06 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:21 UTC