Re: ssh & select() problem on 5.3

From: Claudiu Dragalia-Paraipan <dr.clau_at_gmail.com>
Date: Sun, 28 Nov 2004 18:43:47 +0200
Hi,

Robert Watson wrote:
> Sounds like a bug, but the interesting question is really whether it's a
> kernel bug or an SSH bug.  I'm not up on SSH internals, but there are a
> few other knobs you might try and things to look at that might help
> address whether it's a kernel bug or not:
> 
> (1) Try debug.mpsafenet=0 in loader.conf on the 5.3 box -- if we're
>     looking at a kernel race condition due to a locking bug, that might
>     close the race.  However, it might also just changing the timing...
>     That this happens on SMP and UP suggests that it's not so much a 
>     timing issue. 

I tried debug.mpsafenet=0. No change.

> 
> (2) select() is almost always used to wait for space in a buffer to write,
>     or wait for data in a buffer to read.  Using a combination of
>     netstat(1) and sockstat(1), it would be useful to know whether there
>     is in fact data in either the send or receive buffer.  Combined with
>     inspecting the state of the select arguments and socket buffers in
>     kernel, this might reveal whether perhaps there was a missed wakeup. 
>     It's worth noting that we believe we corrected a bug with exactly thes
>     symptoms shortly before 5.3 release.
> 
> Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
> robert_at_fledge.watson.org      Principal Research Scientist, McAfee Research

I knew about the poll()/select() issue, that's why I thought this is the 
case.
I have tried to same connection from a Windows with Putty client, on one 
machine everything is ok, but on another dmesg triggers again the lock.
A friend tried both FreeBSD 5.3 and Windows, and it seems that it locks 
more often in 5.3, but not only in 5.3.
More, I connected to another machine with ssh, and from there I ssh'ed 
to the server which seems to trigger the lock. It still locks.
Even more, a tcpdump on the other end (I have access to the 
router/firewall, which is right before the machine I am testing with), 
after the lock-up, still shows packets being send from the server to me, 
but a tcpdump at my end shows nothing: packets never get here.
In the light of the new events, I guess I can say that FreeBSD 5.3 acts 
exactly as it should act, select() waits for packets that never get 
here. Unless packets get here but are never processed by kernel (?).

Since the problem occurs only when I connect to the firewall or to a 
server behind it, I started to suspect a hardware failure. Could a 
network card cause such problems ?
The firewall is running on FreeBSD 5.2.1 with PF+ALTQ, and I can observe 
the same behaviour: dmesg locks ssh connection. I have test this with PF 
disabled, and the problem still occurs, so I can eliminate PF as a problem.

I've crossposted to hackers list too, since this can be of interest 
there too.

If anyone has any ideea of what might be going on, it would be helpful.


With respect,

-- 
Claudiu Dragalina-Paraipan
dr.clau_at_gmail.com

Received on Sun Nov 28 2004 - 15:43:03 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:23 UTC