Re: aac(4) resource FIB starvation on BUS scan revisited

From: Alexander Sack <pisymbol_at_gmail.com>
Date: Tue, 8 Dec 2009 12:11:19 -0500
On Tue, Dec 8, 2009 at 11:22 AM, Jung-uk Kim <jkim_at_freebsd.org> wrote:
> On Monday 07 December 2009 11:04 pm, Scott Long wrote:
>> On Dec 7, 2009, at 9:00 PM, Alexander Sack wrote:
>> > On Mon, Dec 7, 2009 at 8:14 PM, Scott Long <scottl_at_samsco.org>
> wrote:
>> >> On Dec 7, 2009, at 6:05 PM, Jung-uk Kim wrote:
>> >>> On Monday 07 December 2009 07:47 pm, Scott Long wrote:
>> >>>> On Dec 7, 2009, at 5:31 PM, Jung-uk Kim wrote:
>> >>>>> On Monday 07 December 2009 05:30 pm, Alexander Sack wrote:
>> >>>>>> On Mon, Dec 7, 2009 at 4:42 PM, Alexander Sack
>> >>>>>> <pisymbol_at_gmail.com>
>> >>>>>
>> >>>>> wrote:
>> >>>>>>> Folks:
>> >>>>>>>
>> >>>>>>> I posted a similar thread on freebsd-scsi only to realize
>> >>>>>>> that scottl had fixed my first issue during some MP CAM
>> >>>>>>> cleanup with respect to a race during resource allocation
>> >>>>>>> issues on a later version of the driver we are using (I
>> >>>>>>> believe we did the same thing to resolve a lock issue on
>> >>>>>>> bootup).
>> >>>>>>>
>> >>>>>>> However on my RELENG_8 box with (2) Adaptec 5085s connected
>> >>>>>>> to some JBODs (9TB each) I still have a FIB starvation
>> >>>>>>> issue during the LUN scan:
>> >>>>>>>
>> >>>>>>> The number of FIBs allocated to this card is 512 (older
>> >>>>>>> cards are 256).  The max_target per bus is 287.  On a six
>> >>>>>>> channel controller with a BUS scan done in parallel I see a
>> >>>>>>> lot of this:
>> >>>>>>>
>> >>>>>>> ...
>> >>>>>>> (probe501:aacp1:0:214:0): Request Requeued
>> >>>>>>> (probe501:aacp1:0:214:0): Retrying Command
>> >>>>>>> (probe520:aacp1:0:233:0): Request Requeued
>> >>>>>>> (probe520:aacp1:0:233:0): Retrying Command
>> >>>>>>> (probe528:aacp1:0:241:0): Request Requeued
>> >>>>>>> (probe528:aacp1:0:241:0): Retrying Command
>> >>>>>>> (probe540:aacp1:0:253:0): Request Requeued
>> >>>>>>> (probe540:aacp1:0:253:0): Retrying Command
>> >>>>>>> (probe541:aacp1:0:254:0): Request Requeued
>> >>>>>>> (probe541:aacp1:0:254:0): Retrying Command
>> >>>>>>> ....
>> >>>>>>>
>> >>>>>>> I think the driver is much happier with the following
>> >>>>>>> attached patch (with dmesg).
>> >>>>>>
>> >>>>>> Patch again but this time not base-64 encoded:
>> >>>>>
>> >>>>> [SNIP!]
>> >>>>>
>> >>>>> I want it to be little conservative here, i.e.,
>> >>>>> pre-allocating half of max_fibs.  Will the attached patch
>> >>>>> work for you?
>> >>>>
>> >>>> The FIB allocation scheme was written when it was common for
>> >>>> machines to only have 64MB of RAM and proportionally less KVA,
>> >>>> so 256KB or 512KB was a lot of RAM to wire down.  Those days
>> >>>> have probably passed.
>> >>>
>> >>> So, what would do if you were hypothetically rewriting it
>> >>> today? :-)
>> >>
>> >> Most hardware have mechanisms for probing their command queue
>> >> depth.  What I
>> >> typically do these days is allocate a minimum number of commands
>> >> so that
>> >> this probing can be done, then do a single slab allocation based
>> >> on the
>> >> results.  AAC doesn't have this capability, but the 256/512 size
>> >> is pretty
>> >> well understood.  The page-by-page allocation of aac works, but
>> >> adds extra
>> >> bookkeeping and complication to the driver.
>> >
>> > Right Scott, that is what JK and I discussed this evening.  I
>> > figured the 128 macro was just historical cruft and your email
>> > confirms it. So are we ALL okay with the original patch as it
>> > stands for now?  JK I am fine with the divide 2 change but I
>> > think raising it to 256 is really the way to go at this point!
>> > :D
>>
>> If you're going to increase it, why not simply increase it to the
>> max amount that is appropriate for each card?
>
> My intention was to minimize impact as little as possible, i.e.,
>
> old card: max fibs == 256, max fibs / 2 == 128, no change
> new card: max fibs == 512, max fibs / 2 == 256, twice
>
> Old cards are most likely to be used on old systems with very little
> RAM (if they are still in production).  Hence, no change is
> necessary.  Anyway I just committed OP's patch (with a minor comment
> tweak).

Thanks JK!

>> One other thing I forgot to mention was contiguous memory.  The
>> page- by-page allocation in aac has another benefit, and that's to
>> not tax contigmalloc with finding 256KB of contiguous memory.
>> That's not a big deal at boot, but is a problem if you load the
>> driver after the system has been running for a while.  It's
>> immensely useful during development, but it's never been clear to
>> me how useful it is in real life.
>
> Thanks for your review and comments!

Ditto to everyone!  :D

-aps
Received on Tue Dec 08 2009 - 16:11:21 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:58 UTC