Re: [BETA7-panic] sodealloc(): so_count 1

From: Robert Watson <rwatson_at_freebsd.org>
Date: Mon, 11 Oct 2004 04:13:15 -0400 (EDT)
On Sun, 10 Oct 2004, Marc UBM Bocklet wrote:

> No, but I can revert the local patches, configure a dump device and try
> getting one tomorrow or the day after that. 

Marc,

Afer a couple of days of experimenting and chatting, Brian and I have
developed what we hope is a less intrusive but fully functional fix for
this problem.  I ran it through a barrage of tests yesterday, although I
couldn't reproduce the problem originally, and the system still appears to
run :-).  I've committed the patch to CVS HEAD (6.x), and will merge to
5.x in a few days, and assuming that your testing of the change doesn't
reveal that it didn't fix the problem.  I have included a copy of the
patch committed (minus $FreeBSD$ change) below. 

If you could give this a spin, it would be much appreciated.  Thanks for
your (and Vlad's) patience as we worked this out!  (And many thanks to
Brian for doing so much of the work to fix it). 

Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
robert_at_fledge.watson.org      Principal Research Scientist, McAfee Research


Index: uipc_socket.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/uipc_socket.c,v
retrieving revision 1.212
retrieving revision 1.213
diff -u -r1.212 -r1.213
--- uipc_socket.c	5 Sep 2004 14:33:21 -0000	1.212
+++ uipc_socket.c	11 Oct 2004 08:11:26 -0000	1.213
_at__at_ -316,22 +316,34 _at__at_
 	return (0);
 }
 
+/*
+ * Attempt to free a socket.  This should really be sotryfree().
+ *
+ * We free the socket if the protocol is no longer interested in the socket,
+ * there's no file descriptor reference, and the refcount is 0.  While the
+ * calling macro sotryfree() tests the refcount, sofree() has to test it
+ * again as it's possible to race with an accept()ing thread if the socket is
+ * in an listen queue of a listen socket, as being in the listen queue
+ * doesn't elevate the reference count.  sofree() acquires the accept mutex
+ * early for this test in order to avoid that race.
+ */
 void
 sofree(so)
 	struct socket *so;
 {
 	struct socket *head;
 
-	KASSERT(so->so_count == 0, ("socket %p so_count not 0", so));
-	SOCK_LOCK_ASSERT(so);
+	SOCK_UNLOCK(so);
+	ACCEPT_LOCK();
+	SOCK_LOCK(so);
 
-	if (so->so_pcb != NULL || (so->so_state & SS_NOFDREF) == 0) {
+	if (so->so_pcb != NULL || (so->so_state & SS_NOFDREF) == 0 ||
+	    so->so_count != 0) {
 		SOCK_UNLOCK(so);
+		ACCEPT_UNLOCK();
 		return;
 	}
 
-	SOCK_UNLOCK(so);
-	ACCEPT_LOCK();
 	head = so->so_head;
 	if (head != NULL) {
 		KASSERT((so->so_qstate & SQ_COMP) != 0 ||
_at__at_ -353,6 +365,7 _at__at_
 		 * the listening socket is closed.
 		 */
 		if ((so->so_qstate & SQ_COMP) != 0) {
+			SOCK_UNLOCK(so);
 			ACCEPT_UNLOCK();
 			return;
 		}
_at__at_ -365,6 +378,7 _at__at_
 	    (so->so_qstate & SQ_INCOMP) == 0,
 	    ("sofree: so_head == NULL, but still SQ_COMP(%d) or SQ_INCOMP(%d)",
 	    so->so_qstate & SQ_COMP, so->so_qstate & SQ_INCOMP));
+	SOCK_UNLOCK(so);
 	ACCEPT_UNLOCK();
 	SOCKBUF_LOCK(&so->so_snd);
 	so->so_snd.sb_flags |= SB_NOINTR;
Received on Mon Oct 11 2004 - 06:14:58 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:16 UTC