Adrian Chadd wrote this message on Wed, May 13, 2015 at 08:34 -0700: > The reason I ask about "why is it faster?" is because for embedded-y > things with low RAM we may not want that to happen due to memory > constraints. However, we may actually want to do some form of > autotuning on some platforms. If you're already running a program, the difference between 1k and 8k isn't significant... I'll give you 64k can be significant for embedded-y platforms... But this goes back to the, we need a global knob saying I want low memory usage, and I am willing to pay for it in performance... > So, if it's underlying block size, maybe BUFSIZ isn't the thing to > tweak, but based on disk io buffer size. > If it's filling L1 or L2 cache with useful work, maybe auto-tune it > based on that. I'm pretty sure this is just simply, syscalls+copies are expensive, and larger block sizes reduces the number of calls, going from 1k to 64k means 64 times less syscalls... So, in my benchmark, we went from 148271 syscalls/second to 3228 syscalls/second for 64k block size, and we got a 40% perf increase on top of this... i.e. we spend ~40% of the cpu time to do 145k syscalls instead of doing real work... > Please don't take this as bikeshedding, I'd really like to see some > "this is why it's faster" analysis rather than just numbers thrown > around. I don't really see a need to analyize this any more... We are batching work in a more effecient manner... I could list many other examples of where we do similar optimizations... -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not."Received on Wed May 13 2015 - 16:13:51 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:57 UTC