Re: ssh & select() problem on 5.3

From: Robert Watson <rwatson_at_freebsd.org>
Date: Sun, 28 Nov 2004 14:24:02 +0000 (GMT)
On Sun, 28 Nov 2004, Claudiu Dragalia-Paraipan wrote:

> Since I have upgraded to FreeBSD 5.3 I have the following problem with
> SSH client: I log on several FreeBSD 5.2.1 machines, and when I start a
> command that gives a 'large' result (like dmesg, cat a file), ssh client
> locks.  I ran ssh in gdb, and found out that it locks in select() in
> libc.so.5.  I do it like this: run ssh in gdb, connect to the host, run
> a dmesg.  After this it locks, and I have to send a SIGKILL or SIGTERM
> before I can see this in gdb: 
> 
> Program received signal SIGTERM, Terminated.
> 0x282b5dd7 in select () from /lib/libc.so.5
> (gdb)
> 
> The result of a bt is (if relevant):
> #0  0x282b5dd7 in select () from /lib/libc.so.5
<snip> 
> This happens both in SMP on UP kernels. Attached is dmesg for UP kernel.
> Also, ocasionally it hangs at shutdown or reboot, at random places (?),
> and it seems to be happening after I have a locked ssh client in the
> system. If you need more informations about this, and you think this are
> related, let me know and I will run a kernel with debugging enabled, to
> get more informations.

Sounds like a bug, but the interesting question is really whether it's a
kernel bug or an SSH bug.  I'm not up on SSH internals, but there are a
few other knobs you might try and things to look at that might help
address whether it's a kernel bug or not:

(1) Try debug.mpsafenet=0 in loader.conf on the 5.3 box -- if we're
    looking at a kernel race condition due to a locking bug, that might
    close the race.  However, it might also just changing the timing...
    That this happens on SMP and UP suggests that it's not so much a 
    timing issue. 

(2) select() is almost always used to wait for space in a buffer to write,
    or wait for data in a buffer to read.  Using a combination of
    netstat(1) and sockstat(1), it would be useful to know whether there
    is in fact data in either the send or receive buffer.  Combined with
    inspecting the state of the select arguments and socket buffers in
    kernel, this might reveal whether perhaps there was a missed wakeup. 
    It's worth noting that we believe we corrected a bug with exactly thes
    symptoms shortly before 5.3 release.

Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
robert_at_fledge.watson.org      Principal Research Scientist, McAfee Research
Received on Sun Nov 28 2004 - 13:26:02 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:23 UTC