On 22/4/18 10:36 pm, Rodney W. Grimes wrote: >> Konstantin Belousov wrote: >>> On Sat, Apr 21, 2018 at 11:30:55PM +0000, Rick Macklem wrote: >>>> Konstantin Belousov wrote: >>>>> On Sat, Apr 21, 2018 at 07:21:58PM +0000, Rick Macklem wrote: >>>>>> I decided to start a new thread on current related to SCHED_ULE, since I see >>>>>> more than just performance degradation and on a recent current kernel. >>>>>> (I cc'd a couple of the people discussing performance problems in freebsd-stable >>>>>> recently under a subject line of "Re: kern.sched.quantum: Creepy, sadistic scheduler". >>>>>> >>>>>> When testing a pNFS server on a single core i386 with 256Mbytes using a Dec. 2017 >>>>>> current/head kernel, I would see about a 30% performance degradation (elapsed >>>>>> run time for a kernel build over NFSv4.1) when the server kernel was built with >>>>>> options SCHED_ULE >>>>>> instead of >>>>>> options SCHED_4BSD >> So, now that I have decreased the number of nfsd kernel threads to 32, it works >> with both schedulers and with essentially the same performance. (ie. The 30% >> performance degradation has disappeared.) >> >>>>>> Now, with a kernel from a couple of days ago, the >>>>>> options SCHED_ULE >>>>>> kernel becomes unusable shortly after starting testing. >>>>>> I have seen two variants of this: >>>>>> - Became essentially hung. All I could do was ping the machine from the network. >>>>>> - Reported "vm_thread_new: kstack allocation failed >>>>>> and then any attempt to do anything gets "No more processes". >>>>> This is strange. It usually means that you get KVA either exhausted or >>>>> severly fragmented. >>>> Yes. I reduced the number of nfsd threads from 256->32 and the SCHED_ULE >>>> kernel is working ok now. I haven't done enough to compare performance yet. >>>> Maybe I'll post again when I have some numbers. >>>> >>>>> Enter ddb, it should be operational since pings are replied. Try to see >>>>> where the threads are stuck. >>>> I didn't do this, since reducing the number of kernel threads seems to have fixed >>>> the problem. For the pNFS server, the nfsd threads will spawn additional kernel >>>> threads to do proxies to the mirrored DS servers. >>>> >>>>>> with the only difference being a kernel built with >>>>>> options SCHED_4BSD >>>>>> everything works and performs the same as the Dec 2017 kernel. >>>>>> >>>>>> I can try rolling back through the revisions, but it would be nice if someone >>>>>> could suggest where to start, because it takes a couple of hours to build a >>>>>> kernel on this system. >>>>>> >>>>>> So, something has made things worse for a head/current kernel this winter, rick >>>>> There are at least two potentially relevant changes. >>>>> >>>>> First is r326758 Dec 11 which bumped KSTACK_PAGES on i386 to 4. >>>> I've been running this machine with KSTACK_PAGES=4 for some time, so no change. >> W.r.t. Rodney Grimes comments about this (which didn't end up in this messages >> in the thread): >> I didn't see any instability when using KSTACK_PAGES=4 for this until this cropped >> up and seemed to be scheduler related (but not really, it seems). >> I bumped it to KSTACK_PAGES=4 because I needed that for the pNFS Metadata >> Server code. >> >> Yes, NFS does use quite a bit of kernel stack. Unfortunately, it isn't one big >> item getting allocated on the stack, but many moderate sized ones. >> (A part of it is multiple instances of "struct vattr", some buried in "struct nfsvattr", >> that NFS needs to use. I don't think these are large enough to justify malloc/free, >> but it has to use several of them.) >> >> One case I did try fixing was about 6 cases where "struct nfsstate" ended up on >> the stack. I changes the code to malloc/free them and then when testing, to >> my surprise I had a 20% performance hit and shelved the patch. >> Now that I know that the server was running near its limit, I might try this one >> again, to see if the performance hit doesn't occur when the machine has adequate >> memory. If the performance hit goes away, I could commit this, but it wouldn't >> have that much effect on the kstack usage. (It's interesting how this patch ended >> up related to the issue this thread discussed.) > Anything we can do to help relieve KSTACK usage, especially on i386 > is helpfull. These is a thread back quite some time where someone > came up with a compile time static "this functions uses X bytes of > local stack" and a bit of clean up was done. We should persue > this issue further. that was me. use |-Wframe-larger-than||=<arg>|ΒΆ <https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-wframe-larger-than> and set it to something like 512 bytes (obviously you have to make warnings non fatal as well). > > My experiece with the i386/KSTACK issues was attempting to do installs > from snapshot .iso's, I usually had to change to a custom kernel without > INVARIANTS and WITNESS, or reduce KSTACK to 2 and suffer the small stack > problem (ie, dont use NFS during install). Neither was very pleasant. > > I have found it in practical to run the 4 page KSTACK in production > VM's using i386 due to memory requirements. I run many very lean > i386 VM's with 64MB of memory. I suspect our user base also has > many people doing this, and it would be to our advantage to try > and reduce our kernel stack needs. > > >>>>> Second is r332489 Apr 13, which introduced 4/4G KVA/UVA split. >>>> Could this change have resulted in the system being able to allocate fewer >>>> kernel threads/stacks for some reason? >>> Well, it could, as anything can be buggy. But the intent of the change >>> was to give 4G KVA, and it did. >> Righto. No concern here. I suspect the Dec. 2017 kernel was close to the limit >> (see performance issue that went away, noted above) and any change could >> have pushed it across the line, I think. >> >>>>> Consequences of the first one are obvious, it is much harder to find >>>>> the place to map the stack. Second change, on the other hand, provides >>>>> almost full 4G for KVA and should have mostly compensate for the negative >>>>> effects of the first. >>>>> >>>>> And, I cannot see how changing the scheduler would fix or even affect that >>>>> behaviour. >>>> My hunch is that the system was running near its limit for kernel threads/stacks. >>>> Then, somehow, the timing SCHED_ULE caused resulted in the nfsd trying to get >>>> to a higher peak number of threads and hit the limit. >>>> SCHED_4BSD happened to result in timing such that it stayed just below the >>>> limit and worked. >>>> I can think of a couple of things that might affect this: >>>> 1 - If SCHED_ULE doesn't do the termination of kernel threads as quickly, then >>>> they wouldn't terminate and release their resources before more new ones >>>> are spawned. >>> Scheduler has nothing to do with the threads termination. It might >>> select running threads in a way that causes the undesired pattern to >>> appear which might create some amount of backlog for termination, but >>> I doubt it. >>> >>>> 2 - If SCHED_ULE handles the nfsd threads in a more "bursty" way, then the burst >>>> could try and spawn more mirror DS worker threads at about the same time. >>>> >>>> Anyhow, thanks for the help, rick >> Have a good day, rick >> _______________________________________________ >> freebsd-current_at_freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-current >> To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org" >>Received on Mon Apr 23 2018 - 05:48:14 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:15 UTC