Am 04.04.18 um 18:45 schrieb Andriy Gapon: > On 04/04/2018 16:19, Stefan Esser wrote: >> I have identified the cause of the extremely low I/O performance (2 to 6 read >> operations scheduled per second). >> >> The default value of kern.sched.preempt_thresh=0 does not give any CPU to the >> I/O bound process unless a (long) time slice expires (kern.sched.quantum=94488 >> on my system with HZ=1000) or one of the CPU bound processes voluntarily gives >> up the CPU (or exits). >> >> Any non-zero value of preemt_thresh lets the system perform I/O in parallel >> with the CPU bound processes, again. > > Let me guess... you have a custom kernel configuration and, unlike GENERIC > (assuming x86), it does not have 'options PREEMPTION'? Yes, thank you for pointing that out!!! I used to have PREEMPTION and FULL_PREEMPTION in my kernel configuration, and apparently have deleted both options when only FULL_PREEMPTION was supposed to go ... After looking at sched_ule.c and top/machine.c it appears, that the value of preempt_thresh corresponds to the PRI value as shown by top (or ps -l) plus PZERO which is calculated as (PRI_MIN_KERN=80) + 20. What I do not understand, though, is that the decision about a preemption is only based on the calculated new priority of the thread, but not at all on the priority of other running threads (except the idle thread). On my system, a "real" batch job (i.e. one that does not voluntarily give up the CPU due to I/O) seems to have a PRI value of 80 to 100 (growing over time), while an interactive process has a PRI of 20, a maximally "niced" interactive process has 52. So, I'd expect a reasonable default value of preempt_thresh to be slightly above 120 (e.g. 124) to prevent I/O heavy threads from stealing each other the CPU too often, and to prevent "niced" processes from doing the same ... The two values configured into the kernel (80 for PREEMPTION and 255 for FULL_PREEMPTION) seem to be extremes, but something in between (e.g. 124) is not offered (can only be configured via sysctl without any information for the correspondence between the threshold value and the PRI value in any document I've found, besides the kernel sources ...). Is PRI_MIN_KERN=80 really a good default value for the preemption threshold? Regards, STefanReceived on Thu Apr 05 2018 - 10:40:05 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:15 UTC