Re: NFS client perf. degradation when SCHED_ULE is used (was when SMP enabled)

From: Colin Percival <cperciva_at_tarsnap.com>
Date: Sat, 3 Jun 2017 05:01:42 +0000
On 05/28/17 13:16, Rick Macklem wrote:
> cperciva_at_ is running a highly parallelized buuildworld and he sees better
> slightly better elapsed times and much lower system CPU for SCHED_ULE.
> 
> As such, I suspect it is the single threaded, processes mostly sleeping waiting
> for I/O case that is broken.
> I suspect this is how many people use NFS, since a highly parallelized make would
> not be a typical NFS client task, I think?

Running `make buildworld -j36` on an EC2 "c4.8xlarge" instance (36 vCPUs, 60
GB RAM, 10 GbE) with GENERIC-NODEBUG, ULE has a slight edge over 4BSD:

GENERIC-NODEBUG, SCHED_4BSD:
        1h14m12.48s real        6h25m44.59s user        1h4m53.42s sys
        1h15m25.48s real        6h25m12.20s user        1h4m34.23s sys
        1h13m34.02s real        6h25m14.44s user        1h4m09.55s sys
        1h13m44.04s real        6h25m08.60s user        1h4m40.21s sys
        1h14m59.69s real        6h25m53.13s user        1h4m55.20s sys
        1h14m24.00s real        6h24m59.29s user        1h5m37.31s sys

GENERIC-NODEBUG, SCHED_ULE:
        1h13m00.61s real        6h02m47.59s user        26m45.89s sys
        1h12m30.18s real        6h01m39.97s user        26m16.45s sys
        1h13m08.43s real        6h01m46.94s user        26m39.20s sys
        1h12m18.94s real        6h02m26.80s user        27m39.71s sys
        1h13m21.38s real        6h00m46.13s user        27m14.96s sys
        1h12m01.80s real        6h02m24.48s user        27m18.37s sys

Running `make buildworld -j2` on an E2 "m4.large" instance (2 vCPUs, 8 GB RAM,
~ 500 Mbps network), 4BSD has a slight edge over ULE on real and sys
time but is slightly worse on user time:

GENERIC-NODEBUG, SCHED_4BSD:
        6h29m25.17s real        7h2m56.02s user         14m52.63s sys
        6h29m36.82s real        7h2m58.19s user         15m14.21s sys
        6h28m27.61s real        7h1m38.24s user         14m56.91s sys
        6h27m05.42s real        7h1m38.57s user         15m04.31s sys

GENERIC-NODEBUG, SCHED_ULE:
        6h34m19.41s real        6h59m43.99s user        18m8.62s sys
        6h33m55.08s real        6h58m44.91s user        18m4.31s sys
        6h34m49.68s real        6h56m03.58s user        17m49.83s sys
        6h35m22.14s real        6h58m12.62s user        17m52.05s sys

Note that in both cases there is lots of idle time (although far more in the
-j36 case); this is partly due to a lack of parallelism in buildworld, but
largely due to having /usr/obj mounted on Amazon EFS.

These differences all seem within the range which could result from cache
effects due to threads staying on one CPU rather than bouncing around; so
whatever Rick is tripping over, it doesn't seem to be affecting these tests.

-- 
Colin Percival
Security Officer Emeritus, FreeBSD | The power to serve
Founder, Tarsnap | www.tarsnap.com | Online backups for the truly paranoid
Received on Sat Jun 03 2017 - 03:01:44 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:11 UTC