Re: NFS client perf. degradation when SCHED_ULE is used (was when SMP enabled)

From: Rick Macklem <rmacklem_at_uoguelph.ca>
Date: Sat, 3 Jun 2017 12:25:04 +0000
Colin Percival wrote:
>On 05/28/17 13:16, Rick Macklem wrote:
>> cperciva_at_ is running a highly parallelized buuildworld and he sees better
>> slightly better elapsed times and much lower system CPU for SCHED_ULE.
>>
>> As such, I suspect it is the single threaded, processes mostly sleeping waiting
>> for I/O case that is broken.
>> I suspect this is how many people use NFS, since a highly parallelized make would
>> not be a typical NFS client task, I think?
>
>Running `make buildworld -j36` on an EC2 "c4.8xlarge" instance (36 vCPUs, 60
>GB RAM, 10 GbE) with GENERIC-NODEBUG, ULE has a slight edge over 4BSD:
>
>GENERIC-NODEBUG, SCHED_4BSD:
>        1h14m12.48s real        6h25m44.59s user        1h4m53.42s sys
>        1h15m25.48s real        6h25m12.20s user        1h4m34.23s sys
>        1h13m34.02s real        6h25m14.44s user        1h4m09.55s sys
>        1h13m44.04s real        6h25m08.60s user        1h4m40.21s sys
>        1h14m59.69s real        6h25m53.13s user        1h4m55.20s sys
>        1h14m24.00s real        6h24m59.29s user        1h5m37.31s sys
>
>GENERIC-NODEBUG, SCHED_ULE:
>       1h13m00.61s real        6h02m47.59s user        26m45.89s sys
>        1h12m30.18s real        6h01m39.97s user        26m16.45s sys
>        1h13m08.43s real        6h01m46.94s user        26m39.20s sys
>        1h12m18.94s real        6h02m26.80s user        27m39.71s sys
>        1h13m21.38s real        6h00m46.13s user        27m14.96s sys
>        1h12m01.80s real        6h02m24.48s user        27m18.37s sys
>
>Running `make buildworld -j2` on an E2 "m4.large" instance (2 vCPUs, 8 GB RAM,
>~ 500 Mbps network), 4BSD has a slight edge over ULE on real and sys
>time but is slightly worse on user time:
>
>GENERIC-NODEBUG, SCHED_4BSD:
>        6h29m25.17s real        7h2m56.02s user         14m52.63s sys
>        6h29m36.82s real        7h2m58.19s user         15m14.21s sys
>        6h28m27.61s real        7h1m38.24s user         14m56.91s sys
>        6h27m05.42s real        7h1m38.57s user         15m04.31s sys
>
>GENERIC-NODEBUG, SCHED_ULE:
>       6h34m19.41s real        6h59m43.99s user        18m8.62s sys
>        6h33m55.08s real        6h58m44.91s user        18m4.31s sys
>        6h34m49.68s real        6h56m03.58s user        17m49.83s sys
>        6h35m22.14s real        6h58m12.62s user        17m52.05s sys
Doing these test runs, but on the 36v CPU system would be closer to what I
was testing. My tests do not use "-j" and run on an 8core chunk
of real hardware.

>Note that in both cases there is lots of idle time (although far more in the
>-j36 case); this is partly due to a lack of parallelism in buildworld, but
>largely due to having /usr/obj mounted on Amazon EFS.
>
>These differences all seem within the range which could result from cache
>effects due to threads staying on one CPU rather than bouncing around; so
>whatever Rick is tripping over, it doesn't seem to be affecting these tests.

Yep. Thanks for doing the testing, rick
Received on Sat Jun 03 2017 - 10:25:08 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:11 UTC