Re: panic: tcp_input: TCPS_LISTEN in netinet/tcp_input.c:1016

From: Robert Watson <rwatson_at_FreeBSD.org>
Date: Thu, 17 Feb 2005 10:12:49 +0000 (GMT)
(harti CC'd because the attach patch touches netatm, even though this is a
 bit of ATM I'm not sure he's using; likewise bms)

On Wed, 16 Feb 2005, Peter Holm wrote:

> > This would appear to be an inter-layer race between the socket code and
> > the TCP code.  In particular, it looks like a SYN has come in during the
> > call to listen() on another CPU (or perhaps a preempted thread), after the
> > TCP state has been set up for the listening tcpcb, but before the
> > SO_ACCEPTCONN flag is set in the socket state.  The TCP code panics
> > because it expects that if a tcpcb is in TCPS_LISTEN, the matching socket
> > should be in SO_ACCEPTCONN.  I'm working on a patch and hope to put it
> > together for you today.  However, this patch substantially tears up the
> > current listen code for several protocols, so it will need a fair amount
> > of testing.
<snip>
> You're welcome. It is always great to get feedback on the testing
> I'm doing.

Here's the patch I mentioned before.  It requires some amount of testing,
as I also mentioned :-).  It does a couple of things:

- Pushes the socket state transition from the socket layer solisten() to
  to socket "library"  routines called from the protocol.  This permits
  the socket routines to be called while holding the protocol mutexes,
  preventing a race exposing the incomplete socket state transition to TCP
  after the TCP state transition has completed.

- Holds the socket lock for the duration of the socket state test and set,
  and over the protocol layer state transition, which is now possible as
  the socket lock is acquired by the protocol layer, rather than vice
  versa.  This prevents additional state related races in the socket
  layer.

I believe this will prevent the following scenario that occurred with the
previous code:

(1) The socket layer tests and finds that the socket can transition to
    listening state.

(2) The socket layer calls the protocol layer.

(3) The protocol layer transitions the TCP pcb to listening stat.e

(4) The netisr preempts the user thread that called listen() in order to
    deliver a SYN, which finds the protocol layer in the listen state, but
    the socket layer in the pre-listen state.  Panic.

(5) Theoretical return to the socket layer and completion of the socket
    state transition.

Unfortunately, this changes the implementation of pru_listen for all
protocols, so affects UNIX domain sockets, TCP/IP, IPX/SPX, and ATM, hence
the need for substantial testing.

Thanks!

Robert N M Watson

Index: kern/uipc_socket.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/uipc_socket.c,v
retrieving revision 1.226
diff -u -r1.226 uipc_socket.c
--- kern/uipc_socket.c	24 Jan 2005 12:20:20 -0000	1.226
+++ kern/uipc_socket.c	16 Feb 2005 13:57:01 -0000
_at__at_ -1,6 +1,6 _at__at_
 /*-
  * Copyright (c) 2004 The FreeBSD Foundation
- * Copyright (c) 2004 Robert Watson
+ * Copyright (c) 2004-2005 Robert N. M. Watson
  * Copyright (c) 1982, 1986, 1988, 1990, 1993
  *	The Regents of the University of California.  All rights reserved.
  *
_at__at_ -275,6 +275,18 _at__at_
 	mtx_unlock(&so_global_mtx);
 }
 
+/*
+ * solisten() transitions a socket from a non-listening state to a listening
+ * state, or is used to update the listen queue depth on an existing listen
+ * socket.  In order to transition socket layer and protocol state atomically,
+ * the socket layer implementation is called from the protocol while socket
+ * locks (as well as any appropriate protocol locks) are held.  The protocol
+ * will use solisten_proto_check() to check that the socket layer approves of
+ * the transition, once the socket lock is acquired; it may then implement
+ * the socket state transition when it is ready by invoking solisten_proto().
+ * Protocol implementors are advised not to release the socket lock between
+ * the socket layer check and state transition.
+ */
 int
 solisten(so, backlog, td)
 	struct socket *so;
_at__at_ -283,30 +295,57 _at__at_
 {
 	int error;
 
-	/*
-	 * XXXRW: Ordering issue here -- perhaps we need to set
-	 * SO_ACCEPTCONN before the call to pru_listen()?
-	 * XXXRW: General atomic test-and-set concerns here also.
-	 */
-	if (so->so_state & (SS_ISCONNECTED | SS_ISCONNECTING |
-			    SS_ISDISCONNECTING))
-		return (EINVAL);
 	error = (*so->so_proto->pr_usrreqs->pru_listen)(so, td);
 	if (error)
 		return (error);
-	ACCEPT_LOCK();
-	if (TAILQ_EMPTY(&so->so_comp)) {
-		SOCK_LOCK(so);
-		so->so_options |= SO_ACCEPTCONN;
-		SOCK_UNLOCK(so);
-	}
+
+	/*
+	 * XXXRW: Really, we should pass the backlog down to the protocol so
+	 * that the qlimit can be set on the socket when it's converted to a
+	 * listen socket, rather than after connections may have started to
+	 * come in.
+	 */
 	if (backlog < 0 || backlog > somaxconn)
 		backlog = somaxconn;
 	so->so_qlimit = backlog;
-	ACCEPT_UNLOCK();
 	return (0);
 }
 
+int
+solisten_proto_check(so)
+	struct socket *so;
+{
+
+	SOCK_LOCK_ASSERT(so);
+
+	if (so->so_state & (SS_ISCONNECTED | SS_ISCONNECTING |
+	    SS_ISDISCONNECTING))
+		return (EINVAL);
+	return (0);
+}
+
+void
+solisten_proto(so)
+	struct socket *so;
+{
+
+	SOCK_LOCK_ASSERT(so);
+
+	/*
+	 * XXXRW: In the 4.4BSDlite code drop, the socket so_comp queue is
+	 * checked before setting SO_ACCEPTCONN: if connected sockets are
+	 * already in the queue, SO_ACCEPTCONN is not set.  I'm not sure why
+	 * this is the case.  Perhaps to avoid simply setting the flag again
+	 * in the event it's a call of listen() on an already listening
+	 * socket?  In this code, we set it unconditionally -- setting it a
+	 * second time for a follow-up call shouldn't hurt, and it's unclear
+	 * that the race suggested by such code is desirable -- among other
+	 * things, it allows the protocol and socket state to get out of
+	 * sync.
+	 */
+	so->so_options |= SO_ACCEPTCONN;
+}
+
 /*
  * Attempt to free a socket.  This should really be sotryfree().
  *
Index: kern/uipc_usrreq.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/uipc_usrreq.c,v
retrieving revision 1.146
diff -u -r1.146 uipc_usrreq.c
--- kern/uipc_usrreq.c	6 Jan 2005 23:35:40 -0000	1.146
+++ kern/uipc_usrreq.c	16 Feb 2005 13:45:30 -0000
_at__at_ -124,7 +124,7 _at__at_
 static void    unp_discard(struct file *);
 static void    unp_freerights(struct file **, int);
 static int     unp_internalize(struct mbuf **, struct thread *);
-static int     unp_listen(struct unpcb *, struct thread *);
+static int     unp_listen(struct socket *, struct unpcb *, struct thread *);
 
 static int
 uipc_abort(struct socket *so)
_at__at_ -284,7 +284,7 _at__at_
 		UNP_UNLOCK();
 		return (EINVAL);
 	}
-	error = unp_listen(unp, td);
+	error = unp_listen(so, unp, td);
 	UNP_UNLOCK();
 	return (error);
 }
_at__at_ -1730,17 +1730,25 _at__at_
 }
 
 static int
-unp_listen(unp, td)
+unp_listen(so, unp, td)
+	struct socket *so;
 	struct unpcb *unp;
 	struct thread *td;
 {
+	int error;
+
 	UNP_LOCK_ASSERT();
 
-	/*
-	 * XXXRW: Why populate the local peer cred with our own credential?
-	 */
+	SOCK_LOCK(so);
+	error = solisten_proto_check(so);
+	if (error != 0) {
+		SOCK_UNLOCK(so);
+		return (error);
+	}
 	cru2x(td->td_ucred, &unp->unp_peercred);
 	unp->unp_flags |= UNP_HAVEPCCACHED;
+	solisten_proto(so);
+	SOCK_UNLOCK(so);
 	return (0);
 }
 
Index: netatm/atm_cm.c
===================================================================
RCS file: /home/ncvs/src/sys/netatm/atm_cm.c,v
retrieving revision 1.32
diff -u -r1.32 atm_cm.c
--- netatm/atm_cm.c	7 Jan 2005 01:45:36 -0000	1.32
+++ netatm/atm_cm.c	16 Feb 2005 14:09:06 -0000
_at__at_ -524,6 +524,7 _at__at_
  * atm_cm_release().
  *
  * Arguments:
+ *	so	optional socket pointer -- if present, will set listen state
  *	epp	pointer to endpoint definition structure
  *	token	endpoint's listen instance token
  *	ap	pointer to listening connection attributes
_at__at_ -535,7 +536,8 _at__at_
  *
  */
 int
-atm_cm_listen(epp, token, ap, copp)
+atm_cm_listen(so, epp, token, ap, copp)
+	struct socket	*so;
 	Atm_endpoint	*epp;
 	void		*token;
 	Atm_attributes	*ap;
_at__at_ -718,7 +720,12 _at__at_
 	/*
 	 * Now try to register the listening connection
 	 */
+	if (so != NULL)
+		SOCK_LOCK(so);
 	s = splnet();
+	err = solisten_proto_check(so);
+	if (err)
+		goto donex;
 	if (atm_cm_match(cop->co_lattr, NULL) != NULL) {
 		/*
 		 * Can't have matching listeners
_at__at_ -728,9 +735,13 _at__at_
 	}
 	cop->co_state = COS_LISTEN;
 	LINK2TAIL(cop, Atm_connection, atm_listen_queue, co_next);
+	if (so != NULL)
+		solisten_proto(so);
 
 donex:
 	(void) splx(s);
+	if (so != NULL)
+		SOCK_UNLOCK(so);
 
 done:
 	if (err) {
Index: netatm/atm_socket.c
===================================================================
RCS file: /home/ncvs/src/sys/netatm/atm_socket.c,v
retrieving revision 1.22
diff -u -r1.22 atm_socket.c
--- netatm/atm_socket.c	7 Jan 2005 01:45:36 -0000	1.22
+++ netatm/atm_socket.c	16 Feb 2005 13:45:30 -0000
_at__at_ -350,7 +350,7 _at__at_
 	/*
 	 * Start listening for incoming calls
 	 */
-	return (atm_cm_listen(epp, atp, &atp->atp_attr, &atp->atp_conn));
+	return (atm_cm_listen(so, epp, atp, &atp->atp_attr, &atp->atp_conn));
 }
 
 
Index: netatm/atm_var.h
===================================================================
RCS file: /home/ncvs/src/sys/netatm/atm_var.h,v
retrieving revision 1.25
diff -u -r1.25 atm_var.h
--- netatm/atm_var.h	7 Jan 2005 01:45:36 -0000	1.25
+++ netatm/atm_var.h	16 Feb 2005 13:45:30 -0000
_at__at_ -81,8 +81,8 _at__at_
 	/* atm_cm.c */
 int		atm_cm_connect(Atm_endpoint *, void *, Atm_attributes *,
 			Atm_connection **);
-int		atm_cm_listen(Atm_endpoint *, void *, Atm_attributes *,
-			Atm_connection **);
+int		atm_cm_listen(struct socket *, Atm_endpoint *, void *,
+			Atm_attributes *, Atm_connection **);
 int		atm_cm_addllc(Atm_endpoint *, void *, struct attr_llc *,
 			Atm_connection *, Atm_connection **);
 int		atm_cm_addparty(Atm_connection *, int, struct t_atm_sap *);
Index: netatm/ipatm/ipatm_load.c
===================================================================
RCS file: /home/ncvs/src/sys/netatm/ipatm/ipatm_load.c,v
retrieving revision 1.20
diff -u -r1.20 ipatm_load.c
--- netatm/ipatm/ipatm_load.c	7 Jan 2005 01:45:37 -0000	1.20
+++ netatm/ipatm/ipatm_load.c	16 Feb 2005 13:45:30 -0000
_at__at_ -522,8 +522,8 _at__at_
 		/*
 		 * Now start listening
 		 */
-		if ((err = atm_cm_listen(&ipatm_endpt, (void *)(intptr_t)i,
-				&ipatm_listeners[i].attr,
+		if ((err = atm_cm_listen(NULL, &ipatm_endpt,
+				(void *)(intptr_t)i, &ipatm_listeners[i].attr,
 				&ipatm_listeners[i].conn)) != 0)
 			goto done;
 	}
Index: netgraph/bluetooth/socket/ng_btsocket_l2cap.c
===================================================================
RCS file: /home/ncvs/src/sys/netgraph/bluetooth/socket/ng_btsocket_l2cap.c,v
retrieving revision 1.15
diff -u -r1.15 ng_btsocket_l2cap.c
--- netgraph/bluetooth/socket/ng_btsocket_l2cap.c	7 Jan 2005 01:45:44 -0000	1.15
+++ netgraph/bluetooth/socket/ng_btsocket_l2cap.c	16 Feb 2005 14:10:46 -0000
_at__at_ -2409,13 +2409,28 _at__at_
 ng_btsocket_l2cap_listen(struct socket *so, struct thread *td)
 {
 	ng_btsocket_l2cap_pcb_p	pcb = so2l2cap_pcb(so);
+	int error;
 
-	if (pcb == NULL)
-		return (EINVAL);
-	if (ng_btsocket_l2cap_node == NULL) 
-		return (EINVAL);
-
-	return ((pcb->psm == 0)? EDESTADDRREQ : 0);
+	SOCK_LOCK(so);
+	error = solisten_proto_check(so);
+	if (error != 0)
+		goto out;
+	if (pcb == NULL) {
+		error = EINVAL;
+		goto out;
+	}
+	if (ng_btsocket_l2cap_node == NULL) {
+		error = EINVAL;
+		goto out;
+	}
+	if (pcb->psm == 0) {
+		error = EDESTADDRREQ;
+		goto out;
+	}
+	solisten_proto(so);
+out:
+	SOCK_UNLOCK(so);
+	return (error);
 } /* ng_btsocket_listen */
 
 /*
Index: netgraph/bluetooth/socket/ng_btsocket_rfcomm.c
===================================================================
RCS file: /home/ncvs/src/sys/netgraph/bluetooth/socket/ng_btsocket_rfcomm.c,v
retrieving revision 1.14
diff -u -r1.14 ng_btsocket_rfcomm.c
--- netgraph/bluetooth/socket/ng_btsocket_rfcomm.c	7 Jan 2005 01:45:44 -0000	1.14
+++ netgraph/bluetooth/socket/ng_btsocket_rfcomm.c	16 Feb 2005 13:45:30 -0000
_at__at_ -800,6 +800,7 _at__at_
 	ng_btsocket_rfcomm_session_p	 s = NULL;
 	struct socket			*l2so = NULL;
 	int				 error;
+	int				 socreate_error;
 
 	if (pcb == NULL)
 		return (EINVAL);
_at__at_ -816,14 +817,18 _at__at_
 	 * session.
 	 */
 
-	error = socreate(PF_BLUETOOTH, &l2so, SOCK_SEQPACKET,
+	socreate_error = socreate(PF_BLUETOOTH, &l2so, SOCK_SEQPACKET,
 			BLUETOOTH_PROTO_L2CAP, td->td_ucred, td);
 
-	/* 
-	 * Look for the session in LISTENING state. There can only be one.
+	/*
+	 * Transition the socket and session into the LISTENING state.  Check
+	 * for collisions first, as there can only be one.
 	 */
-
 	mtx_lock(&ng_btsocket_rfcomm_sessions_mtx);
+	SOCK_LOCK(so);
+	error = solisten_proto_check(so);
+	if (error != 0)
+		goto out;
 
 	LIST_FOREACH(s, &ng_btsocket_rfcomm_sessions, next)
 		if (s->state == NG_BTSOCKET_RFCOMM_SESSION_LISTENING)
_at__at_ -835,10 +840,9 _at__at_
 		 * L2CAP socket. If l2so == NULL then error has the error code 
 		 * from socreate()
 		 */
-
 		if (l2so == NULL) {
-			mtx_unlock(&ng_btsocket_rfcomm_sessions_mtx);
-			return (error);
+			error = socreate_error;
+			goto out;
 		}
 
 		/* 
_at__at_ -848,21 +852,24 _at__at_
 		 * XXX FIXME Note that currently there is no way to adjust MTU
 		 * for the default session.
 		 */
-
 		error = ng_btsocket_rfcomm_session_create(&s, l2so,
 					NG_HCI_BDADDR_ANY, NULL, td);
-		if (error != 0) {
-			mtx_unlock(&ng_btsocket_rfcomm_sessions_mtx);
-			soclose(l2so);
-
-			return (error);
-		}
-	} else if (l2so != NULL)
-		soclose(l2so); /* we don't need new L2CAP socket */
-
+		if (error != 0)
+			goto out;
+		l2so = NULL;
+	}
+	solisten_proto(so);
 	mtx_unlock(&ng_btsocket_rfcomm_sessions_mtx);
-
-	return (0);
+out:
+	SOCK_UNLOCK(so);
+	mtx_unlock(&ng_btsocket_rfcomm_sessions_mtx);
+	/*
+	 * If we still have an l2so reference here, it's unneeded, so release
+	 * it.
+	 */
+	if (l2so != NULL)
+		soclose(l2so);
+	return (error);
 } /* ng_btsocket_listen */
 
 /*
Index: netinet/tcp_usrreq.c
===================================================================
RCS file: /home/ncvs/src/sys/netinet/tcp_usrreq.c,v
retrieving revision 1.114
diff -u -r1.114 tcp_usrreq.c
--- netinet/tcp_usrreq.c	14 Feb 2005 07:37:50 -0000	1.114
+++ netinet/tcp_usrreq.c	16 Feb 2005 13:45:30 -0000
_at__at_ -302,10 +302,15 _at__at_
 	const int inirw = INI_WRITE;
 
 	COMMON_START();
-	if (inp->inp_lport == 0)
+	SOCK_LOCK(so);
+	error = solisten_proto_check(so);
+	if (error == 0 && inp->inp_lport == 0)
 		error = in_pcbbind(inp, (struct sockaddr *)0, td->td_ucred);
-	if (error == 0)
+	if (error == 0) {
 		tp->t_state = TCPS_LISTEN;
+		solisten_proto(so);
+	}
+	SOCK_UNLOCK(so);
 	COMMON_END(PRU_LISTEN);
 }
 
_at__at_ -319,14 +324,19 _at__at_
 	const int inirw = INI_WRITE;
 
 	COMMON_START();
-	if (inp->inp_lport == 0) {
+	SOCK_LOCK(so);
+	error = solisten_proto_check(so);
+	if (error == 0 && inp->inp_lport == 0) {
 		inp->inp_vflag &= ~INP_IPV4;
 		if ((inp->inp_flags & IN6P_IPV6_V6ONLY) == 0)
 			inp->inp_vflag |= INP_IPV4;
 		error = in6_pcbbind(inp, (struct sockaddr *)0, td->td_ucred);
 	}
-	if (error == 0)
+	if (error == 0) {
 		tp->t_state = TCPS_LISTEN;
+		solisten_proto(so);
+	}
+	SOCK_UNLOCK(so);
 	COMMON_END(PRU_LISTEN);
 }
 #endif /* INET6 */
Index: netipx/spx_usrreq.c
===================================================================
RCS file: /home/ncvs/src/sys/netipx/spx_usrreq.c,v
retrieving revision 1.61
diff -u -r1.61 spx_usrreq.c
--- netipx/spx_usrreq.c	9 Jan 2005 05:31:16 -0000	1.61
+++ netipx/spx_usrreq.c	16 Feb 2005 13:45:30 -0000
_at__at_ -1532,10 +1532,15 _at__at_
 
 	IPX_LIST_LOCK();
 	IPX_LOCK(ipxp);
-	if (ipxp->ipxp_lport == 0)
+	SOCK_LOCK(so);
+	error = solisten_proto_check(so);
+	if (error == 0 && ipxp->ipxp_lport == 0)
 		error = ipx_pcbbind(ipxp, NULL, td);
-	if (error == 0)
+	if (error == 0) {
 		cb->s_state = TCPS_LISTEN;
+		solisten_proto(so);
+	}
+	SOCK_UNLOCK(so);
 	IPX_UNLOCK(ipxp);
 	IPX_LIST_UNLOCK();
 	return (error);
Index: sys/socketvar.h
===================================================================
RCS file: /home/ncvs/src/sys/sys/socketvar.h,v
retrieving revision 1.137
diff -u -r1.137 socketvar.h
--- sys/socketvar.h	30 Jan 2005 13:11:44 -0000	1.137
+++ sys/socketvar.h	16 Feb 2005 13:57:42 -0000
_at__at_ -512,6 +512,8 _at__at_
 void	soisdisconnected(struct socket *so);
 void	soisdisconnecting(struct socket *so);
 int	solisten(struct socket *so, int backlog, struct thread *td);
+void	solisten_proto(struct socket *so);
+int	solisten_proto_check(struct socket *so);
 struct socket *
 	sonewconn(struct socket *head, int connstatus);
 int	sooptcopyin(struct sockopt *sopt, void *buf, size_t len, size_t minlen);
Received on Thu Feb 17 2005 - 09:14:16 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:28 UTC