NFS client performance degradation when SMP enabled

From: Rick Macklem <rmacklem_at_uoguelph.ca> Date: Wed, 24 May 2017 20:40:00 +0000 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:11 UTC

Without boring you with too much detail, I have been doing development/testing
of pNFS stuff (mostly server side) on a 1 year old kernel (Apr. 12, 2016).
When I recently carried the code across to a recent kernel, everything seemed to work,
but performance was much slower.
After some fiddling around, it appears to be on the NFS client side and nothing in the
NFS client code seemed to be causing it. (RPC counts were almost exactly the same,
for example. I tried reverting r316532 and disabling vfs.nfs.use_buf_pager. Neither
made a significant difference.)

I made most of the performance degradation go away by disabling SMP on the client.
Here's some elapsed times for kernel builds with everything the same except for
which kernel and SMP enabled/disabled (amd64 client machine).
1 year old kernel, SMP enabled  - 100minutes
recent kernel, SMP disabled        - 113minutes
recent kernel, SMP enabled        -  148minutes
(The builds were all of the same kernel sources. When I say "1 year old" vs "recent"
 I am referring to which kernel was booted for the test run.)

All I can think of is that some change in the last year has resulted in an increase in
something like interrupt latency or context switch latency that has caused this?

Anyone have an idea what this might be caused by or any tunables to fool with
beyond disabling SMP (which I suspect won't be a popular answer to "how to fix
slow NFS";-).

I haven't yet tried fiddling with interrupt moderation on the net interface, but
the tests all used the same settings.

rick