Re: Userland hangs on overloaded server

From: Dan Nelson <dnelson_at_allantgroup.com>
Date: Wed, 21 Apr 2004 15:38:56 -0500
In the last episode (Apr 21), Niklas Saers said:
> Due to a few problems we had with 5.2, we decided to take the server
> up to CURRENT until 5.3. The following problem has been from we
> installed the CURRENT of March 30th to and including yesterday's
> source.
> 
> After having been running for a while, userland stops responding. All
> the running daemons, webservers and inetd keep running. When I enter
> by SSH, sshd will accept the connection and show motd, but then it
> will not give me a shell, but simply hang. If this happens while I'm
> editing a file, I can save my file fine and exit to the shell and get
> a prompt, but if I press enter or type any command, it will not
> return until I've rebooted.

Ctrl-T is a good thing to press when you think something's hung. 
Hopefully you'll get something like

load: 1.22  cmd: sleep 72646 [nanslp] 0.00u 0.00s 0% 32k

Which tells you two important things:  The command (sleep in my
example), and where in the kernel it is (the "nanslp" wait is in the
code path that handles the sleep syscall).  If you see that stuff is
waiting on [biord], [biowr], [wdrain], or [ufs], something is hammering
your disks.  If it's [sbwait], it could be networking problems.  If
it's got "nfs" in it, it's waiting on remote nfs services.  You can
grep the kernel source for other wait strings to see where they are.

Your description makes me think that you might be having disk/nfs
problems, or there's some sort of deadlock that's keeping the disk
subsystem from doing anything.

-- 
	Dan Nelson
	dnelson_at_allantgroup.com
Received on Wed Apr 21 2004 - 11:38:59 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:52 UTC