Hi, Robert Watson wrote: > Sounds like a bug, but the interesting question is really whether it's a > kernel bug or an SSH bug. I'm not up on SSH internals, but there are a > few other knobs you might try and things to look at that might help > address whether it's a kernel bug or not: > > (1) Try debug.mpsafenet=0 in loader.conf on the 5.3 box -- if we're > looking at a kernel race condition due to a locking bug, that might > close the race. However, it might also just changing the timing... > That this happens on SMP and UP suggests that it's not so much a > timing issue. I tried debug.mpsafenet=0. No change. > > (2) select() is almost always used to wait for space in a buffer to write, > or wait for data in a buffer to read. Using a combination of > netstat(1) and sockstat(1), it would be useful to know whether there > is in fact data in either the send or receive buffer. Combined with > inspecting the state of the select arguments and socket buffers in > kernel, this might reveal whether perhaps there was a missed wakeup. > It's worth noting that we believe we corrected a bug with exactly thes > symptoms shortly before 5.3 release. > > Robert N M Watson FreeBSD Core Team, TrustedBSD Projects > robert_at_fledge.watson.org Principal Research Scientist, McAfee Research I knew about the poll()/select() issue, that's why I thought this is the case. I have tried to same connection from a Windows with Putty client, on one machine everything is ok, but on another dmesg triggers again the lock. A friend tried both FreeBSD 5.3 and Windows, and it seems that it locks more often in 5.3, but not only in 5.3. More, I connected to another machine with ssh, and from there I ssh'ed to the server which seems to trigger the lock. It still locks. Even more, a tcpdump on the other end (I have access to the router/firewall, which is right before the machine I am testing with), after the lock-up, still shows packets being send from the server to me, but a tcpdump at my end shows nothing: packets never get here. In the light of the new events, I guess I can say that FreeBSD 5.3 acts exactly as it should act, select() waits for packets that never get here. Unless packets get here but are never processed by kernel (?). Since the problem occurs only when I connect to the firewall or to a server behind it, I started to suspect a hardware failure. Could a network card cause such problems ? The firewall is running on FreeBSD 5.2.1 with PF+ALTQ, and I can observe the same behaviour: dmesg locks ssh connection. I have test this with PF disabled, and the problem still occurs, so I can eliminate PF as a problem. I've crossposted to hackers list too, since this can be of interest there too. If anyone has any ideea of what might be going on, it would be helpful. With respect, -- Claudiu Dragalina-Paraipan dr.clau_at_gmail.com
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:23 UTC