I observed that SCHED_ULE doesn't give fair amount of CPU time to processes which are using scheduler-activation-based threads when other (semi-)CPU-intensive, non-P_SA processes are running. # for example, browsing a complicated web page while compiling some # amount of code with nice 0. After spending several hours, I finally tracked it down to the following code in sched_ule.c: <code> **** snip **** void sched_switch(struct thread *td) { **** snip **** if (TD_IS_RUNNING(td)) { if (td->td_proc->p_flag & P_SA) { kseq_load_rem(KSEQ_CPU(ke->ke_cpu), ke); setrunqueue(td); } else kseq_runq_add(KSEQ_SELF(), ke); **** snip **** void sched_add(struct thread *td) { **** snip **** case PRI_TIMESHARE: if (SCHED_CURR(kg, ke)) ke->ke_runq = kseq->ksq_curr; else ke->ke_runq = kseq->ksq_next; break; **** snip **** </code> The problem is that setrunqueue() calls sched_add(), which resets ke_runq, thus non-interactive threads are likely to be put into ksq_next regardless of however much slices remaining. On the contrary, threads of !P_SA processes stay in ksq_curr unless slices have been expired, since !P_SA case bypass setrunqueue() => sched_add() path. In order to reduce the difference, I tested three different strategies. 1. preserve ke_runq in P_SA case (ule_runq_preserve.patch) This became a bit hackish, but I felt the characteristics of ULE were well preserved. 2. set ke_runq to ksq_next if the given thread is considered non-interactive in !P_SA case (ule_runq_reset.patch) I felt that the scheduler behaves a bit like the SCHED_4BSD does, which I think is not good. 3. use setrunqueue() (= sched_add()) in !P_SA case, too, like SCHED_4BSD does (ule_sameas_sa.patch) I felt that the scheduler behaves much more like the SCHED_4BSD (read: good characteristics of ULE seemed to fade out), but it might be scientifically correct. In either way, P_SA processes were given reasonable amount of CPU time relative to the !P_SA processes, while with unmodified scheduler, most of CPU time was eaten up by cc1plus(PRI=136..139) and nearly zero CPU to epiphany-bin(PRI=92 or so). # checked with top, epiphany+libpthread and compiling 4k-lines C++ program # with CXXFLAGS='-pipe -O3 etc...', took several minutes on Pen2_at_300MHz Since I am totally unfamilier with the scheduler things, all of the three can be completely wrong or irrelevant to the problem. But I hope one of them brings some lights to scheduler gulus. Thank you for reading, taku -- -|-__ YAMAMOTO, Taku <taku_at_cent.saitama-u.ac.jp> | __ < Post Scriptum: Sorry for no concrete statistics :)
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:42 UTC