aac(4) resource FIB starvation on BUS scan revisited

From: Alexander Sack <pisymbol_at_gmail.com> Date: Mon, 7 Dec 2009 16:42:29 -0500 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:58 UTC

Folks:

I posted a similar thread on freebsd-scsi only to realize that scottl had
fixed my first issue during some MP CAM cleanup with respect to a race
during resource allocation issues on a later version of the driver we are
using (I believe we did the same thing to resolve a lock issue on bootup).

However on my RELENG_8 box with (2) Adaptec 5085s connected to some JBODs
(9TB each) I still have a FIB starvation issue during the LUN scan:

The number of FIBs allocated to this card is 512 (older cards are 256).  The
max_target per bus is 287.  On a six channel controller with a BUS scan done
in parallel I see a lot of this:

...
(probe501:aacp1:0:214:0): Request Requeued
(probe501:aacp1:0:214:0): Retrying Command
(probe520:aacp1:0:233:0): Request Requeued
(probe520:aacp1:0:233:0): Retrying Command
(probe528:aacp1:0:241:0): Request Requeued
(probe528:aacp1:0:241:0): Retrying Command
(probe540:aacp1:0:253:0): Request Requeued
(probe540:aacp1:0:253:0): Retrying Command
(probe541:aacp1:0:254:0): Request Requeued
(probe541:aacp1:0:254:0): Retrying Command
....

I think the driver is much happier with the following attached patch (with
dmesg).  The CAM probeXXX process is now much much faster with ZERO
retries.  Is there anything bad about adding PIM_SYNCSCAN to hba_misc?
What's the down side?  It ensures minimally you don't run out of FIBs during
a scan.

The patch also bumps the number of FIBs to the maximum since I think its
good to have that pool preallocated and its not that much memory on modern
systems (this also helps if you have a controller that supports 512).  Its 2
per page (FIBs are 2k) so its either 256 or 512, i.e. maximum of 1MB pool of
FIBs.  Perhaps that is not really necessary but again, why not?  (if I get
shot down so be it!)

Anybody?  Is this PR worthy?

-aps