Konstantin Belousov wrote: >On Sat, Apr 21, 2018 at 11:30:55PM +0000, Rick Macklem wrote: >> Konstantin Belousov wrote: >> >On Sat, Apr 21, 2018 at 07:21:58PM +0000, Rick Macklem wrote: >> >> I decided to start a new thread on current related to SCHED_ULE, since I see >> >> more than just performance degradation and on a recent current kernel. >> >> (I cc'd a couple of the people discussing performance problems in freebsd-stable >> >> recently under a subject line of "Re: kern.sched.quantum: Creepy, sadistic scheduler". >> >> >> >> When testing a pNFS server on a single core i386 with 256Mbytes using a Dec. 2017 >> >> current/head kernel, I would see about a 30% performance degradation (elapsed >> >> run time for a kernel build over NFSv4.1) when the server kernel was built with >> >> options SCHED_ULE >> >> instead of >> >> options SCHED_4BSD So, now that I have decreased the number of nfsd kernel threads to 32, it works with both schedulers and with essentially the same performance. (ie. The 30% performance degradation has disappeared.) >> >> >> >> Now, with a kernel from a couple of days ago, the >> >> options SCHED_ULE >> >> kernel becomes unusable shortly after starting testing. >> >> I have seen two variants of this: >> >> - Became essentially hung. All I could do was ping the machine from the network. >> >> - Reported "vm_thread_new: kstack allocation failed >> >> and then any attempt to do anything gets "No more processes". >> >This is strange. It usually means that you get KVA either exhausted or >> >severly fragmented. >> Yes. I reduced the number of nfsd threads from 256->32 and the SCHED_ULE >> kernel is working ok now. I haven't done enough to compare performance yet. >> Maybe I'll post again when I have some numbers. >> >> >Enter ddb, it should be operational since pings are replied. Try to see >> >where the threads are stuck. >> I didn't do this, since reducing the number of kernel threads seems to have fixed >> the problem. For the pNFS server, the nfsd threads will spawn additional kernel >> threads to do proxies to the mirrored DS servers. >> >> >> with the only difference being a kernel built with >> >> options SCHED_4BSD >> >> everything works and performs the same as the Dec 2017 kernel. >> >> >> >> I can try rolling back through the revisions, but it would be nice if someone >> >> could suggest where to start, because it takes a couple of hours to build a >> >> kernel on this system. >> >> >> >> So, something has made things worse for a head/current kernel this winter, rick >> > >> >There are at least two potentially relevant changes. >> > >> >First is r326758 Dec 11 which bumped KSTACK_PAGES on i386 to 4. >> I've been running this machine with KSTACK_PAGES=4 for some time, so no change. W.r.t. Rodney Grimes comments about this (which didn't end up in this messages in the thread): I didn't see any instability when using KSTACK_PAGES=4 for this until this cropped up and seemed to be scheduler related (but not really, it seems). I bumped it to KSTACK_PAGES=4 because I needed that for the pNFS Metadata Server code. Yes, NFS does use quite a bit of kernel stack. Unfortunately, it isn't one big item getting allocated on the stack, but many moderate sized ones. (A part of it is multiple instances of "struct vattr", some buried in "struct nfsvattr", that NFS needs to use. I don't think these are large enough to justify malloc/free, but it has to use several of them.) One case I did try fixing was about 6 cases where "struct nfsstate" ended up on the stack. I changes the code to malloc/free them and then when testing, to my surprise I had a 20% performance hit and shelved the patch. Now that I know that the server was running near its limit, I might try this one again, to see if the performance hit doesn't occur when the machine has adequate memory. If the performance hit goes away, I could commit this, but it wouldn't have that much effect on the kstack usage. (It's interesting how this patch ended up related to the issue this thread discussed.) >> >> >Second is r332489 Apr 13, which introduced 4/4G KVA/UVA split. >> Could this change have resulted in the system being able to allocate fewer >> kernel threads/stacks for some reason? >Well, it could, as anything can be buggy. But the intent of the change >was to give 4G KVA, and it did. Righto. No concern here. I suspect the Dec. 2017 kernel was close to the limit (see performance issue that went away, noted above) and any change could have pushed it across the line, I think. >> >> >Consequences of the first one are obvious, it is much harder to find >> >the place to map the stack. Second change, on the other hand, provides >> >almost full 4G for KVA and should have mostly compensate for the negative >> >effects of the first. >> > >> >And, I cannot see how changing the scheduler would fix or even affect that >> >behaviour. >> My hunch is that the system was running near its limit for kernel threads/stacks. >> Then, somehow, the timing SCHED_ULE caused resulted in the nfsd trying to get >> to a higher peak number of threads and hit the limit. >> SCHED_4BSD happened to result in timing such that it stayed just below the >> limit and worked. >> I can think of a couple of things that might affect this: >> 1 - If SCHED_ULE doesn't do the termination of kernel threads as quickly, then >> they wouldn't terminate and release their resources before more new ones >> are spawned. >Scheduler has nothing to do with the threads termination. It might >select running threads in a way that causes the undesired pattern to >appear which might create some amount of backlog for termination, but >I doubt it. > >> 2 - If SCHED_ULE handles the nfsd threads in a more "bursty" way, then the burst >> could try and spawn more mirror DS worker threads at about the same time. >> >> Anyhow, thanks for the help, rick Have a good day, rickReceived on Sun Apr 22 2018 - 11:43:56 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:15 UTC