Re: RELENG_7_0: vm_thread_new: kstack allocation failed

From: Kostik Belousov <kostikbel_at_gmail.com> Date: Wed, 23 Jan 2008 12:52:24 +0200 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:26 UTC

On Tue, Jan 22, 2008 at 09:59:33PM -0800, Xin LI wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Kostik Belousov wrote:
> > On Tue, Jan 22, 2008 at 03:45:32PM -0800, Xin LI wrote:
> >> -----BEGIN PGP SIGNED MESSAGE-----
> >> Hash: SHA1
> >>
> >> Hi,
> >>
> >> I have got a lot of this in dmesg output for RELENG_7_0 as of today:
> >>
> >> vm_thread_new: kstack allocation failed
> >> vm_thread_new: kstack allocation failed
> >> vm_thread_new: kstack allocation failed
> >> vm_thread_new: kstack allocation failed
> >> vm_thread_new: kstack allocation failed
> >> vm_thread_new: kstack allocation failed
> >>
> >> Any idea?
> > 
> > Does it cause any problems aside from printing these messages ?
> 
> It causes some fork() to fail.
> 
> > What workload do you put on the machine ?
> 
> It was an rsync from NFS to ZFS with ~15M of files, and rsync will
> consume basically all physical memory.  I end up with some 2GB active,
> 4GB wired thing. (The system has 8GB of RAM), and I added a "make -j9
> buildworld" into the chaos to see if things get worse, and it did :-)
> 
> > The messages came from the failure of the kernel to allocate address
> > space for the kernel stack for a thread being created. Previously, the
> > system would panic encountering this situation.
> 
> Yes, I knew, previously it just panic and hangs there, and thanks a lot
> for fixing it =-)
> 
> > This may happen due to kernel_map address space depletion, for instance,
> > by having a lot (on i386 machines with > 1Gb memory, ~40000) threads.
> 
> It seems that I have hit some sort of "leak" or some exhaustion issue.
> Say, when the workload is gone, the system did not recover from the
> situation, and reboot worked fine.
> 
> The system is sort of in production and it is about 20 miles away from
> my office.  Do you want me to do some experiments for this?

Yes, I want to know what exactly leaked. Ideally, I would like to see the
series of the output of the vmstat -z and vmstat -m for some time before
the system is bogged down. But, even the one snapshot of the vmstat -z/-m
output immediately before things stop working would be good to look at.

Output of the ps auxwwH is helpful too.