On 2012-May-30 13:27:03 +1000, Peter Jeremy <peter_at_rulingia.com> wrote: >On 2012-May-29 02:18:25 +0400, Dmitry Marakasov <amdmi3_at_amdmi3.ru> wrote: >>Then you should try to profile it - my script basically runs >>delete-old delete-old-libs for every knob (131 of them), and it >>hadn't taken more than 4 seconds even once. > >I've done some investigating and the problem is that "xargs -n1" >fork()/exec()s /bin/echo on each file (and there are 5538 files for >me). Changing this to "tr ' ' '\n'" reduces "make delete-old" runtime >to 1.75s - which is much nicer. I've checked a variety of other >systems running 8.x & 9.x and the 97s seems to be anomalously long so >I'll do some more investigating. I've tracked the problem down to excessive VM faults caused by jemalloc. Whilst executing /bin/echo, jemalloc mmap()s two 4MiB chunks of memory. Unless you build with MALLOC_PRODUCTION (which I hadn't), it then proceeds to verify that both blocks are zero-filled. This causes 2048 (unnecessary) page faults (out of a total of 2133). When I rebuilt jemalloc with MALLOC_PRODUCTION, this dropped to 87 page faults (cf 76 an 8.x and 62 on 9.x) and the elapsed time for "make delete-old" dropped to slightly more than 8.x & 9.x. "xargs -n1" is probably a worst case scenario for jemalloc but this probably similarly affects other short-lived processes (and the shell scripts that invoke them). It's a pity that this particular test is a compile-time option. I still think that saving 5500 fork()/exec() pairs is a good reason to switch from "xargs -n1" to "tr ' ' '\n'". -- Peter Jeremy
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:27 UTC