(harti CC'd because the attach patch touches netatm, even though this is a bit of ATM I'm not sure he's using; likewise bms) On Wed, 16 Feb 2005, Peter Holm wrote: > > This would appear to be an inter-layer race between the socket code and > > the TCP code. In particular, it looks like a SYN has come in during the > > call to listen() on another CPU (or perhaps a preempted thread), after the > > TCP state has been set up for the listening tcpcb, but before the > > SO_ACCEPTCONN flag is set in the socket state. The TCP code panics > > because it expects that if a tcpcb is in TCPS_LISTEN, the matching socket > > should be in SO_ACCEPTCONN. I'm working on a patch and hope to put it > > together for you today. However, this patch substantially tears up the > > current listen code for several protocols, so it will need a fair amount > > of testing. <snip> > You're welcome. It is always great to get feedback on the testing > I'm doing. Here's the patch I mentioned before. It requires some amount of testing, as I also mentioned :-). It does a couple of things: - Pushes the socket state transition from the socket layer solisten() to to socket "library" routines called from the protocol. This permits the socket routines to be called while holding the protocol mutexes, preventing a race exposing the incomplete socket state transition to TCP after the TCP state transition has completed. - Holds the socket lock for the duration of the socket state test and set, and over the protocol layer state transition, which is now possible as the socket lock is acquired by the protocol layer, rather than vice versa. This prevents additional state related races in the socket layer. I believe this will prevent the following scenario that occurred with the previous code: (1) The socket layer tests and finds that the socket can transition to listening state. (2) The socket layer calls the protocol layer. (3) The protocol layer transitions the TCP pcb to listening stat.e (4) The netisr preempts the user thread that called listen() in order to deliver a SYN, which finds the protocol layer in the listen state, but the socket layer in the pre-listen state. Panic. (5) Theoretical return to the socket layer and completion of the socket state transition. Unfortunately, this changes the implementation of pru_listen for all protocols, so affects UNIX domain sockets, TCP/IP, IPX/SPX, and ATM, hence the need for substantial testing. Thanks! Robert N M Watson Index: kern/uipc_socket.c =================================================================== RCS file: /home/ncvs/src/sys/kern/uipc_socket.c,v retrieving revision 1.226 diff -u -r1.226 uipc_socket.c --- kern/uipc_socket.c 24 Jan 2005 12:20:20 -0000 1.226 +++ kern/uipc_socket.c 16 Feb 2005 13:57:01 -0000 _at__at_ -1,6 +1,6 _at__at_ /*- * Copyright (c) 2004 The FreeBSD Foundation - * Copyright (c) 2004 Robert Watson + * Copyright (c) 2004-2005 Robert N. M. Watson * Copyright (c) 1982, 1986, 1988, 1990, 1993 * The Regents of the University of California. All rights reserved. * _at__at_ -275,6 +275,18 _at__at_ mtx_unlock(&so_global_mtx); } +/* + * solisten() transitions a socket from a non-listening state to a listening + * state, or is used to update the listen queue depth on an existing listen + * socket. In order to transition socket layer and protocol state atomically, + * the socket layer implementation is called from the protocol while socket + * locks (as well as any appropriate protocol locks) are held. The protocol + * will use solisten_proto_check() to check that the socket layer approves of + * the transition, once the socket lock is acquired; it may then implement + * the socket state transition when it is ready by invoking solisten_proto(). + * Protocol implementors are advised not to release the socket lock between + * the socket layer check and state transition. + */ int solisten(so, backlog, td) struct socket *so; _at__at_ -283,30 +295,57 _at__at_ { int error; - /* - * XXXRW: Ordering issue here -- perhaps we need to set - * SO_ACCEPTCONN before the call to pru_listen()? - * XXXRW: General atomic test-and-set concerns here also. - */ - if (so->so_state & (SS_ISCONNECTED | SS_ISCONNECTING | - SS_ISDISCONNECTING)) - return (EINVAL); error = (*so->so_proto->pr_usrreqs->pru_listen)(so, td); if (error) return (error); - ACCEPT_LOCK(); - if (TAILQ_EMPTY(&so->so_comp)) { - SOCK_LOCK(so); - so->so_options |= SO_ACCEPTCONN; - SOCK_UNLOCK(so); - } + + /* + * XXXRW: Really, we should pass the backlog down to the protocol so + * that the qlimit can be set on the socket when it's converted to a + * listen socket, rather than after connections may have started to + * come in. + */ if (backlog < 0 || backlog > somaxconn) backlog = somaxconn; so->so_qlimit = backlog; - ACCEPT_UNLOCK(); return (0); } +int +solisten_proto_check(so) + struct socket *so; +{ + + SOCK_LOCK_ASSERT(so); + + if (so->so_state & (SS_ISCONNECTED | SS_ISCONNECTING | + SS_ISDISCONNECTING)) + return (EINVAL); + return (0); +} + +void +solisten_proto(so) + struct socket *so; +{ + + SOCK_LOCK_ASSERT(so); + + /* + * XXXRW: In the 4.4BSDlite code drop, the socket so_comp queue is + * checked before setting SO_ACCEPTCONN: if connected sockets are + * already in the queue, SO_ACCEPTCONN is not set. I'm not sure why + * this is the case. Perhaps to avoid simply setting the flag again + * in the event it's a call of listen() on an already listening + * socket? In this code, we set it unconditionally -- setting it a + * second time for a follow-up call shouldn't hurt, and it's unclear + * that the race suggested by such code is desirable -- among other + * things, it allows the protocol and socket state to get out of + * sync. + */ + so->so_options |= SO_ACCEPTCONN; +} + /* * Attempt to free a socket. This should really be sotryfree(). * Index: kern/uipc_usrreq.c =================================================================== RCS file: /home/ncvs/src/sys/kern/uipc_usrreq.c,v retrieving revision 1.146 diff -u -r1.146 uipc_usrreq.c --- kern/uipc_usrreq.c 6 Jan 2005 23:35:40 -0000 1.146 +++ kern/uipc_usrreq.c 16 Feb 2005 13:45:30 -0000 _at__at_ -124,7 +124,7 _at__at_ static void unp_discard(struct file *); static void unp_freerights(struct file **, int); static int unp_internalize(struct mbuf **, struct thread *); -static int unp_listen(struct unpcb *, struct thread *); +static int unp_listen(struct socket *, struct unpcb *, struct thread *); static int uipc_abort(struct socket *so) _at__at_ -284,7 +284,7 _at__at_ UNP_UNLOCK(); return (EINVAL); } - error = unp_listen(unp, td); + error = unp_listen(so, unp, td); UNP_UNLOCK(); return (error); } _at__at_ -1730,17 +1730,25 _at__at_ } static int -unp_listen(unp, td) +unp_listen(so, unp, td) + struct socket *so; struct unpcb *unp; struct thread *td; { + int error; + UNP_LOCK_ASSERT(); - /* - * XXXRW: Why populate the local peer cred with our own credential? - */ + SOCK_LOCK(so); + error = solisten_proto_check(so); + if (error != 0) { + SOCK_UNLOCK(so); + return (error); + } cru2x(td->td_ucred, &unp->unp_peercred); unp->unp_flags |= UNP_HAVEPCCACHED; + solisten_proto(so); + SOCK_UNLOCK(so); return (0); } Index: netatm/atm_cm.c =================================================================== RCS file: /home/ncvs/src/sys/netatm/atm_cm.c,v retrieving revision 1.32 diff -u -r1.32 atm_cm.c --- netatm/atm_cm.c 7 Jan 2005 01:45:36 -0000 1.32 +++ netatm/atm_cm.c 16 Feb 2005 14:09:06 -0000 _at__at_ -524,6 +524,7 _at__at_ * atm_cm_release(). * * Arguments: + * so optional socket pointer -- if present, will set listen state * epp pointer to endpoint definition structure * token endpoint's listen instance token * ap pointer to listening connection attributes _at__at_ -535,7 +536,8 _at__at_ * */ int -atm_cm_listen(epp, token, ap, copp) +atm_cm_listen(so, epp, token, ap, copp) + struct socket *so; Atm_endpoint *epp; void *token; Atm_attributes *ap; _at__at_ -718,7 +720,12 _at__at_ /* * Now try to register the listening connection */ + if (so != NULL) + SOCK_LOCK(so); s = splnet(); + err = solisten_proto_check(so); + if (err) + goto donex; if (atm_cm_match(cop->co_lattr, NULL) != NULL) { /* * Can't have matching listeners _at__at_ -728,9 +735,13 _at__at_ } cop->co_state = COS_LISTEN; LINK2TAIL(cop, Atm_connection, atm_listen_queue, co_next); + if (so != NULL) + solisten_proto(so); donex: (void) splx(s); + if (so != NULL) + SOCK_UNLOCK(so); done: if (err) { Index: netatm/atm_socket.c =================================================================== RCS file: /home/ncvs/src/sys/netatm/atm_socket.c,v retrieving revision 1.22 diff -u -r1.22 atm_socket.c --- netatm/atm_socket.c 7 Jan 2005 01:45:36 -0000 1.22 +++ netatm/atm_socket.c 16 Feb 2005 13:45:30 -0000 _at__at_ -350,7 +350,7 _at__at_ /* * Start listening for incoming calls */ - return (atm_cm_listen(epp, atp, &atp->atp_attr, &atp->atp_conn)); + return (atm_cm_listen(so, epp, atp, &atp->atp_attr, &atp->atp_conn)); } Index: netatm/atm_var.h =================================================================== RCS file: /home/ncvs/src/sys/netatm/atm_var.h,v retrieving revision 1.25 diff -u -r1.25 atm_var.h --- netatm/atm_var.h 7 Jan 2005 01:45:36 -0000 1.25 +++ netatm/atm_var.h 16 Feb 2005 13:45:30 -0000 _at__at_ -81,8 +81,8 _at__at_ /* atm_cm.c */ int atm_cm_connect(Atm_endpoint *, void *, Atm_attributes *, Atm_connection **); -int atm_cm_listen(Atm_endpoint *, void *, Atm_attributes *, - Atm_connection **); +int atm_cm_listen(struct socket *, Atm_endpoint *, void *, + Atm_attributes *, Atm_connection **); int atm_cm_addllc(Atm_endpoint *, void *, struct attr_llc *, Atm_connection *, Atm_connection **); int atm_cm_addparty(Atm_connection *, int, struct t_atm_sap *); Index: netatm/ipatm/ipatm_load.c =================================================================== RCS file: /home/ncvs/src/sys/netatm/ipatm/ipatm_load.c,v retrieving revision 1.20 diff -u -r1.20 ipatm_load.c --- netatm/ipatm/ipatm_load.c 7 Jan 2005 01:45:37 -0000 1.20 +++ netatm/ipatm/ipatm_load.c 16 Feb 2005 13:45:30 -0000 _at__at_ -522,8 +522,8 _at__at_ /* * Now start listening */ - if ((err = atm_cm_listen(&ipatm_endpt, (void *)(intptr_t)i, - &ipatm_listeners[i].attr, + if ((err = atm_cm_listen(NULL, &ipatm_endpt, + (void *)(intptr_t)i, &ipatm_listeners[i].attr, &ipatm_listeners[i].conn)) != 0) goto done; } Index: netgraph/bluetooth/socket/ng_btsocket_l2cap.c =================================================================== RCS file: /home/ncvs/src/sys/netgraph/bluetooth/socket/ng_btsocket_l2cap.c,v retrieving revision 1.15 diff -u -r1.15 ng_btsocket_l2cap.c --- netgraph/bluetooth/socket/ng_btsocket_l2cap.c 7 Jan 2005 01:45:44 -0000 1.15 +++ netgraph/bluetooth/socket/ng_btsocket_l2cap.c 16 Feb 2005 14:10:46 -0000 _at__at_ -2409,13 +2409,28 _at__at_ ng_btsocket_l2cap_listen(struct socket *so, struct thread *td) { ng_btsocket_l2cap_pcb_p pcb = so2l2cap_pcb(so); + int error; - if (pcb == NULL) - return (EINVAL); - if (ng_btsocket_l2cap_node == NULL) - return (EINVAL); - - return ((pcb->psm == 0)? EDESTADDRREQ : 0); + SOCK_LOCK(so); + error = solisten_proto_check(so); + if (error != 0) + goto out; + if (pcb == NULL) { + error = EINVAL; + goto out; + } + if (ng_btsocket_l2cap_node == NULL) { + error = EINVAL; + goto out; + } + if (pcb->psm == 0) { + error = EDESTADDRREQ; + goto out; + } + solisten_proto(so); +out: + SOCK_UNLOCK(so); + return (error); } /* ng_btsocket_listen */ /* Index: netgraph/bluetooth/socket/ng_btsocket_rfcomm.c =================================================================== RCS file: /home/ncvs/src/sys/netgraph/bluetooth/socket/ng_btsocket_rfcomm.c,v retrieving revision 1.14 diff -u -r1.14 ng_btsocket_rfcomm.c --- netgraph/bluetooth/socket/ng_btsocket_rfcomm.c 7 Jan 2005 01:45:44 -0000 1.14 +++ netgraph/bluetooth/socket/ng_btsocket_rfcomm.c 16 Feb 2005 13:45:30 -0000 _at__at_ -800,6 +800,7 _at__at_ ng_btsocket_rfcomm_session_p s = NULL; struct socket *l2so = NULL; int error; + int socreate_error; if (pcb == NULL) return (EINVAL); _at__at_ -816,14 +817,18 _at__at_ * session. */ - error = socreate(PF_BLUETOOTH, &l2so, SOCK_SEQPACKET, + socreate_error = socreate(PF_BLUETOOTH, &l2so, SOCK_SEQPACKET, BLUETOOTH_PROTO_L2CAP, td->td_ucred, td); - /* - * Look for the session in LISTENING state. There can only be one. + /* + * Transition the socket and session into the LISTENING state. Check + * for collisions first, as there can only be one. */ - mtx_lock(&ng_btsocket_rfcomm_sessions_mtx); + SOCK_LOCK(so); + error = solisten_proto_check(so); + if (error != 0) + goto out; LIST_FOREACH(s, &ng_btsocket_rfcomm_sessions, next) if (s->state == NG_BTSOCKET_RFCOMM_SESSION_LISTENING) _at__at_ -835,10 +840,9 _at__at_ * L2CAP socket. If l2so == NULL then error has the error code * from socreate() */ - if (l2so == NULL) { - mtx_unlock(&ng_btsocket_rfcomm_sessions_mtx); - return (error); + error = socreate_error; + goto out; } /* _at__at_ -848,21 +852,24 _at__at_ * XXX FIXME Note that currently there is no way to adjust MTU * for the default session. */ - error = ng_btsocket_rfcomm_session_create(&s, l2so, NG_HCI_BDADDR_ANY, NULL, td); - if (error != 0) { - mtx_unlock(&ng_btsocket_rfcomm_sessions_mtx); - soclose(l2so); - - return (error); - } - } else if (l2so != NULL) - soclose(l2so); /* we don't need new L2CAP socket */ - + if (error != 0) + goto out; + l2so = NULL; + } + solisten_proto(so); mtx_unlock(&ng_btsocket_rfcomm_sessions_mtx); - - return (0); +out: + SOCK_UNLOCK(so); + mtx_unlock(&ng_btsocket_rfcomm_sessions_mtx); + /* + * If we still have an l2so reference here, it's unneeded, so release + * it. + */ + if (l2so != NULL) + soclose(l2so); + return (error); } /* ng_btsocket_listen */ /* Index: netinet/tcp_usrreq.c =================================================================== RCS file: /home/ncvs/src/sys/netinet/tcp_usrreq.c,v retrieving revision 1.114 diff -u -r1.114 tcp_usrreq.c --- netinet/tcp_usrreq.c 14 Feb 2005 07:37:50 -0000 1.114 +++ netinet/tcp_usrreq.c 16 Feb 2005 13:45:30 -0000 _at__at_ -302,10 +302,15 _at__at_ const int inirw = INI_WRITE; COMMON_START(); - if (inp->inp_lport == 0) + SOCK_LOCK(so); + error = solisten_proto_check(so); + if (error == 0 && inp->inp_lport == 0) error = in_pcbbind(inp, (struct sockaddr *)0, td->td_ucred); - if (error == 0) + if (error == 0) { tp->t_state = TCPS_LISTEN; + solisten_proto(so); + } + SOCK_UNLOCK(so); COMMON_END(PRU_LISTEN); } _at__at_ -319,14 +324,19 _at__at_ const int inirw = INI_WRITE; COMMON_START(); - if (inp->inp_lport == 0) { + SOCK_LOCK(so); + error = solisten_proto_check(so); + if (error == 0 && inp->inp_lport == 0) { inp->inp_vflag &= ~INP_IPV4; if ((inp->inp_flags & IN6P_IPV6_V6ONLY) == 0) inp->inp_vflag |= INP_IPV4; error = in6_pcbbind(inp, (struct sockaddr *)0, td->td_ucred); } - if (error == 0) + if (error == 0) { tp->t_state = TCPS_LISTEN; + solisten_proto(so); + } + SOCK_UNLOCK(so); COMMON_END(PRU_LISTEN); } #endif /* INET6 */ Index: netipx/spx_usrreq.c =================================================================== RCS file: /home/ncvs/src/sys/netipx/spx_usrreq.c,v retrieving revision 1.61 diff -u -r1.61 spx_usrreq.c --- netipx/spx_usrreq.c 9 Jan 2005 05:31:16 -0000 1.61 +++ netipx/spx_usrreq.c 16 Feb 2005 13:45:30 -0000 _at__at_ -1532,10 +1532,15 _at__at_ IPX_LIST_LOCK(); IPX_LOCK(ipxp); - if (ipxp->ipxp_lport == 0) + SOCK_LOCK(so); + error = solisten_proto_check(so); + if (error == 0 && ipxp->ipxp_lport == 0) error = ipx_pcbbind(ipxp, NULL, td); - if (error == 0) + if (error == 0) { cb->s_state = TCPS_LISTEN; + solisten_proto(so); + } + SOCK_UNLOCK(so); IPX_UNLOCK(ipxp); IPX_LIST_UNLOCK(); return (error); Index: sys/socketvar.h =================================================================== RCS file: /home/ncvs/src/sys/sys/socketvar.h,v retrieving revision 1.137 diff -u -r1.137 socketvar.h --- sys/socketvar.h 30 Jan 2005 13:11:44 -0000 1.137 +++ sys/socketvar.h 16 Feb 2005 13:57:42 -0000 _at__at_ -512,6 +512,8 _at__at_ void soisdisconnected(struct socket *so); void soisdisconnecting(struct socket *so); int solisten(struct socket *so, int backlog, struct thread *td); +void solisten_proto(struct socket *so); +int solisten_proto_check(struct socket *so); struct socket * sonewconn(struct socket *head, int connstatus); int sooptcopyin(struct sockopt *sopt, void *buf, size_t len, size_t minlen);Received on Thu Feb 17 2005 - 09:14:16 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:28 UTC