Re: SCHED_ULE makes 256Mbyte i386 unusable

From: Konstantin Belousov <kostikbel_at_gmail.com> Date: Sun, 22 Apr 2018 15:02:41 +0300 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:15 UTC

On Sat, Apr 21, 2018 at 11:30:55PM +0000, Rick Macklem wrote:
> Konstantin Belousov wrote:
> >On Sat, Apr 21, 2018 at 07:21:58PM +0000, Rick Macklem wrote:
> >> I decided to start a new thread on current related to SCHED_ULE, since I see
> >> more than just performance degradation and on a recent current kernel.
> >> (I cc'd a couple of the people discussing performance problems in freebsd-stable
> >>  recently under a subject line of "Re: kern.sched.quantum: Creepy, sadistic scheduler".
> >>
> >> When testing a pNFS server on a single core i386 with 256Mbytes using a Dec. 2017
> >> current/head kernel, I would see about a 30% performance degradation (elapsed
> >> run time for a kernel build over NFSv4.1) when the server kernel was built with
> >> options SCHED_ULE
> >> instead of
> >> options SCHED_4BSD
> >>
> >> Now, with a kernel from a couple of days ago, the
> >> options SCHED_ULE
> >> kernel becomes unusable shortly after starting testing.
> >> I have seen two variants of this:
> >> - Became essentially hung. All I could do was ping the machine from the network.
> >> - Reported "vm_thread_new: kstack allocation failed
> >>   and then any attempt to do anything gets "No more processes".
> >This is strange.  It usually means that you get KVA either exhausted or
> >severly fragmented.
> Yes. I reduced the number of nfsd threads from 256->32 and the SCHED_ULE
> kernel is working ok now. I haven't done enough to compare performance yet.
> Maybe I'll post again when I have some numbers.
> 
> >Enter ddb, it should be operational since pings are replied.  Try to see
> >where the threads are stuck.
> I didn't do this, since reducing the number of kernel threads seems to have fixed
> the problem. For the pNFS server, the nfsd threads will spawn additional kernel
> threads to do proxies to the mirrored DS servers.
> 
> >> with the only difference being a kernel built with
> >> options SCHED_4BSD
> >> everything works and performs the same as the Dec 2017 kernel.
> >>
> >> I can try rolling back through the revisions, but it would be nice if someone
> >> could suggest where to start, because it takes a couple of hours to build a
> >> kernel on this system.
> >>
> >> So, something has made things worse for a head/current kernel this winter, rick
> >
> >There are at least two potentially relevant changes.
> >
> >First is r326758 Dec 11 which bumped KSTACK_PAGES on i386 to 4.
> I've been running this machine with KSTACK_PAGES=4 for some time, so no change.
> 
> >Second is r332489 Apr 13, which introduced 4/4G KVA/UVA split.
> Could this change have resulted in the system being able to allocate fewer
> kernel threads/stacks for some reason?
Well, it could, as anything can be buggy. But the intent of the change
was to give 4G KVA, and it did.

> 
> >Consequences of the first one are obvious, it is much harder to find
> >the place to map the stack.  Second change, on the other hand, provides
> >almost full 4G for KVA and should have mostly compensate for the negative
> >effects of the first.
> >
> >And, I cannot see how changing the scheduler would fix or even affect that
> >behaviour.
> My hunch is that the system was running near its limit for kernel threads/stacks.
> Then, somehow, the timing SCHED_ULE caused resulted in the nfsd trying to get
> to a higher peak number of threads and hit the limit.
> SCHED_4BSD happened to result in timing such that it stayed just below the
> limit and worked.
> I can think of a couple of things that might affect this:
> 1 - If SCHED_ULE doesn't do the termination of kernel threads as quickly, then
>       they wouldn't terminate and release their resources before more new ones
>       are spawned.
Scheduler has nothing to do with the threads termination.  It might
select running threads in a way that causes the undesired pattern to
appear which might create some amount of backlog for termination, but
I doubt it.

> 2 - If SCHED_ULE handles the nfsd threads in a more "bursty" way, then the burst
>       could try and spawn more mirror DS worker threads at about the same time.
> 
> Anyhow, thanks for the help, rick