Re: aac(4) resource FIB starvation on BUS scan revisited

From: Alexander Sack <pisymbol_at_gmail.com>
Date: Mon, 7 Dec 2009 23:28:40 -0500
On Mon, Dec 7, 2009 at 11:17 PM, Alexander Sack <pisymbol_at_gmail.com> wrote:
> On Mon, Dec 7, 2009 at 11:04 PM, Scott Long <scottl_at_samsco.org> wrote:
>>
>> On Dec 7, 2009, at 9:00 PM, Alexander Sack wrote:
>>
>>> On Mon, Dec 7, 2009 at 8:14 PM, Scott Long <scottl_at_samsco.org> wrote:
>>>>
>>>> On Dec 7, 2009, at 6:05 PM, Jung-uk Kim wrote:
>>>>>
>>>>> On Monday 07 December 2009 07:47 pm, Scott Long wrote:
>>>>>>
>>>>>> On Dec 7, 2009, at 5:31 PM, Jung-uk Kim wrote:
>>>>>>>
>>>>>>> On Monday 07 December 2009 05:30 pm, Alexander Sack wrote:
>>>>>>>>
>>>>>>>> On Mon, Dec 7, 2009 at 4:42 PM, Alexander Sack
>>>>>>>> <pisymbol_at_gmail.com>
>>>>>>>
>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Folks:
>>>>>>>>>
>>>>>>>>> I posted a similar thread on freebsd-scsi only to realize that
>>>>>>>>> scottl had fixed my first issue during some MP CAM cleanup with
>>>>>>>>> respect to a race during resource allocation issues on a later
>>>>>>>>> version of the driver we are using (I believe we did the same
>>>>>>>>> thing to resolve a lock issue on bootup).
>>>>>>>>>
>>>>>>>>> However on my RELENG_8 box with (2) Adaptec 5085s connected to
>>>>>>>>> some JBODs (9TB each) I still have a FIB starvation issue
>>>>>>>>> during the LUN scan:
>>>>>>>>>
>>>>>>>>> The number of FIBs allocated to this card is 512 (older cards
>>>>>>>>> are 256).  The max_target per bus is 287.  On a six channel
>>>>>>>>> controller with a BUS scan done in parallel I see a lot of
>>>>>>>>> this:
>>>>>>>>>
>>>>>>>>> ...
>>>>>>>>> (probe501:aacp1:0:214:0): Request Requeued
>>>>>>>>> (probe501:aacp1:0:214:0): Retrying Command
>>>>>>>>> (probe520:aacp1:0:233:0): Request Requeued
>>>>>>>>> (probe520:aacp1:0:233:0): Retrying Command
>>>>>>>>> (probe528:aacp1:0:241:0): Request Requeued
>>>>>>>>> (probe528:aacp1:0:241:0): Retrying Command
>>>>>>>>> (probe540:aacp1:0:253:0): Request Requeued
>>>>>>>>> (probe540:aacp1:0:253:0): Retrying Command
>>>>>>>>> (probe541:aacp1:0:254:0): Request Requeued
>>>>>>>>> (probe541:aacp1:0:254:0): Retrying Command
>>>>>>>>> ....
>>>>>>>>>
>>>>>>>>> I think the driver is much happier with the following attached
>>>>>>>>> patch (with dmesg).
>>>>>>>>
>>>>>>>> Patch again but this time not base-64 encoded:
>>>>>>>
>>>>>>> [SNIP!]
>>>>>>>
>>>>>>> I want it to be little conservative here, i.e., pre-allocating
>>>>>>> half of max_fibs.  Will the attached patch work for you?
>>>>>>
>>>>>> The FIB allocation scheme was written when it was common for
>>>>>> machines to only have 64MB of RAM and proportionally less KVA, so
>>>>>> 256KB or 512KB was a lot of RAM to wire down.  Those days have
>>>>>> probably passed.
>>>>>
>>>>> So, what would do if you were hypothetically rewriting it today? :-)
>>>>>
>>>>
>>>> Most hardware have mechanisms for probing their command queue depth.
>>>>  What I
>>>> typically do these days is allocate a minimum number of commands so that
>>>> this probing can be done, then do a single slab allocation based on the
>>>> results.  AAC doesn't have this capability, but the 256/512 size is
>>>> pretty
>>>> well understood.  The page-by-page allocation of aac works, but adds
>>>> extra
>>>> bookkeeping and complication to the driver.
>>>>
>>>
>>> Right Scott, that is what JK and I discussed this evening.  I figured
>>> the 128 macro was just historical cruft and your email confirms it.
>>> So are we ALL okay with the original patch as it stands for now?  JK I
>>> am fine with the divide 2 change but I think raising it to 256 is
>>> really the way to go at this point!  :D
>>
>>
>> If you're going to increase it, why not simply increase it to the max amount
>> that is appropriate for each card?
>
> Totally right!  I thought though that the max fibs variable was set my
> reading firmware bits up.   Am I off?
>
> 1755         /* Check for broken hardware that does a lower number of
> commands */
> 1756         sc->aac_max_fibs = (sc->flags & AAC_FLAGS_256FIBS ? 256:512);
> 1757
>
> So checking against sc->aac_max_fibs would yield 512 up front on
> modern controllers.
>
>> One other thing I forgot to mention was contiguous memory.  The page-by-page
>> allocation in aac has another benefit, and that's to not tax contigmalloc
>> with finding 256KB of contiguous memory. That's not a big deal at boot, but
>> is a problem if you load the driver after the system has been running for a
>> while.  It's immensely useful during development, but it's never been clear
>> to me how useful it is in real life.
>
> True.  I can't imagine even today after loading it, it would be THAT
> much of an issue (besides its a RAID controller, do you really think
> you are going to load it so late in the game?).
>
> I am filing PR as we speak just to track!


http://www.freebsd.org/cgi/query-pr.cgi?pr=141269

I botched the category though, it should probably be "scsi" please...

-aps
Received on Tue Dec 08 2009 - 03:28:42 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:58 UTC