On Fri, 26 Oct 2007, John Baldwin wrote: > "sbwait" is waiting for data to come in on a socket and "pfault" is waiting > on disk I/O. It is a bit odd that 1187 is holding a lock while sleeping > though that is permitted with an sx lock. Still, if it's supposed to be > protect socket's receive buffer that is odd. Maybe get a trace of the > process blocked in "sbwait" (tr <pid>) and bug rwatson_at_ about it. This is normal -- there are two kinds of locks on each socket buffer: a mutex protecting the integrity of the data structure, and an sx lock serializing I/O on the socket buffer. The latter is intended to prevent I/O interlacing, and replaced the older sblock/sbunlock implemented using tsleep(), flags, and the mutex as an interlock. It is normal for the sx lock to be held over sleeps -- both sbwait, indicating that the I/O has not yet been completed but is waiting on the network or remote endpoint, and a page fault, indicating that a data copy to or from user space is in progress and has blocked waiting on paging. Other threads blocked on the sx lock sleep interruptibly, thanks for Attilio's addition of interruptible sx lock calls. It's not impossible that there are deadlocks involved, but if so, they likely existed before the change to formal sx locks as the previous "by hand" lock construction had essentially identical (but slower) properties. There is an interesting question about whether the strong semantics in the presence of interlaced I/O requests (i.e., simultaneous requests from multiple threads on a single socket) are required, in which case we might be able to weaken the locking here with some reworking of the socket buffer data structures and send/receive routines. For the time being we should leave them as-is for stream sockets, and have optimized them out for UDP sockets by virtue of a simplified sosend_dgram(), which was part of our optimization work for BIND. FYI, BIND uses a single UDP socket for all transactions, and since each transaction is atomic (being a datagram), the overhead of socket buffer locking was significant, not to mention unrequired. This was problem was originally pointed out by Jinmei Tatuya. So, in summary: sleeping while holding the so_rcv/so_snd sx locks is normal, but deadlocks are not, so if the pointer comes back in the direction of the socket code after some more investigation, let me know. Robert N M Watson Computer Laboratory University of CambridgeReceived on Fri Oct 26 2007 - 19:42:14 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:20 UTC