Re: HEADS UP: UNIX domain socket locking changes merged to CVS HEAD

From: Robert Watson <rwatson_at_FreeBSD.org>
Date: Thu, 1 Mar 2007 09:11:15 +0000 (GMT)
On Wed, 28 Feb 2007, Scott Robbins wrote:

> On Wed, Feb 28, 2007 at 07:00:15PM -0500, Randall Stewart wrote:
>> Robert Watson wrote:
>>> On Wed, 28 Feb 2007, Stephane E. Potvin wrote:
>>>>
>>>> Since this commit, I've been observing frequent deadlocks on my laptop,
>>>> mostly when starting-up gnome. It usually takes less than 5 to 10 minutes for
>>>> the deadlock to happens.
>
> I too have been having unexpected lockups--like Randall, I figured it was 
> something to do with my machine.  Interestingly enough, though X will lock 
> up completely (and I can't ssh to the machine, though I can ping it) the 
> jail, which runs a small web site, running on an alias ip address continues 
> to work--I can still access the web site from outside.
>
> However, I haven't been able to apply Robert's patch yet.  As some of you 
> have noticed, there's a bunch of tinderbox failures dying in netstat.  It's 
> happening to me too, so I haven't been able to rebuild.
>
> (this is more of a me too post at this point--I haven't had a chance to do 
> any investigation).

Give uipc_usrreq.c:1.199 a try and see if it helps.

On the web server/jail vs X11 thing: yes -- deadlocks involving lock order 
reversals typically affect two classes of threads.  The first is threads that 
are directly involved in the deadlock (the two reverse lock acquisitions), and 
the second class is threads that end up waiting on any locks (or other 
resources) held by the threads in the deadly embrace.

So X11 and a Gnome process deadlock, then other processes trying to talk to 
X11 or the Gnome process get stuck waiting on them; any processes doing 
operations requiring the global UNIX domain socket be writable hang (so 
processes performing UNIX domain socket connect and bind).  Processes that 
don't go near X11/Gnome, and possibly UNIX domain sockets generally, will do 
alright.  However, I would think that new SSH sessions into the jail might 
also hang since they will try to open new syslog sessions, which requires a 
UNIX domain socket connect operation.  The interrupt thread and netisr don't 
involve UNIX domain sockets at all, and therefore run without a problem, as 
does Apache, which has already established its UNIX domain sockets and has 
nothing further to say on the topic.

These symptoms hold true of deadlocks, but also lock leaks, which are caused 
by a slightly different issue (a missing unlock), but can lead to the same 
cascading failure of dependent processes.

Robert N M Watson
Computer Laboratory
University of Cambridge
Received on Thu Mar 01 2007 - 08:11:16 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:06 UTC