Re: Increasing MAXPHYS

From: Alexander Sack <pisymbol_at_gmail.com> Date: Mon, 22 Mar 2010 11:52:44 -0400 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:02 UTC

On Mon, Mar 22, 2010 at 8:39 AM, John Baldwin <jhb_at_freebsd.org> wrote:
> On Monday 22 March 2010 7:40:18 am Gary Jennejohn wrote:
>> On Sun, 21 Mar 2010 19:03:56 +0200
>> Alexander Motin <mav_at_FreeBSD.org> wrote:
>>
>> > Scott Long wrote:
>> > > Are there non-CAM drivers that look at MAXPHYS, or that silently assume
> that
>> > > MAXPHYS will never be more than 128k?
>> >
>> > That is a question.
>> >
>>
>> I only did a quick&dirty grep looking for MAXPHYS in /sys.
>>
>> Some drivers redefine MAXPHYS to be 512KiB.  Some use their own local
>> MAXPHYS which is usually 128KiB.
>>
>> Some look at MAXPHYS to figure out other things; the details escape me.
>>
>> There's one driver which actually uses 100*MAXPHYS for something, but I
>> didn't check the details.
>>
>> Lots of them were non-CAM drivers AFAICT.
>
> The problem is the drivers that _don't_ reference MAXPHYS.  The driver author
> at the time "knew" that MAXPHYS was 128k, so he did the MAXPHYS-dependent
> calculation and just put the result in the driver (e.g. only supporting up to
> 32 segments (32 4k pages == 128k) in a bus dma tag as a magic number to
> bus_dma_tag_create() w/o documenting that the '32' was derived from 128k or
> what the actual hardware limit on nsegments is).  These cannot be found by a
> simple grep, they require manually inspecting each driver.

100% awesome comment.  On another kernel, I myself was guilty of this
crime (I did have a nice comment though above the def).

This has been a great thread since our application really needs some
of the optimizations that are being thrown around here.  We have found
in real live performance testing that we are almost always either
controller bound (i.e. adding more disks to spread IOPs has little to
no effect in large array configurations on throughput, we suspect that
is hitting the RAID controller's firmware limitations) or tps bound,
i.e. I never thought going from 128k -> 256k per transaction would
have a dramatic effect on throughput (but I never verified).

Back to HBAs,  AFAIK, every modern iteration of the most popular HBAs
can easily do way more than a 128k scatter/gather I/O.  Do you guys
know of any *modern* (circa within the last 3-4 years) that can not do
more than 128k at a shot?

In other words, I've always thought the limit was kernel imposed and
not what the memory controller on the card can do (I certainly never
got the impression talking with some of the IHVs over the years that
they were designing their hardware for a 128k limit - I sure hope
not!).

-aps