Hi, I think I may have found the cause of the pst timeout panics. I'm using the Promise SX6000 RAID on -CURRENT, using the pst driver. Unfortunately, under sufficiently high I/O load, the box starts printing: "pst: timeout mfa=0x00327b90 cmd=0x01" The 'mfa' address varies. It starts printing more and more rapidly, and then eventually the machine wedges solid. Sometimes it makes it to: "panic: timeout table full" Here's what I think is happening. Two timeouts are being scheduled every time a timeout triggers, because pst_timeout schedules a timeout before calling pst_rw to retry the operation. Then pst_rw schedules ANOTHER timeout. Both of these timeouts call pst_timeout, so they double every 10 seconds until there are a large enough number of timeouts firing, retrying the same I/O operation, that the table fills and the machine panics. Check out the following diff http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/pst/pst-raid.c.diff?r1=1.8&r2=1.9&f=h This is where pst_rw was changed to schedule its own timeouts, but the timeout function didn't have its removed. Do you think this could be the correct explanation? It seems like once pst_timeout is called, the machine is doomed... I'm recompiling my kernel now to test the fix under load. --Aaron
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:21 UTC