On Wed, Dec 30, 2020 at 08:04:03AM +0100, Hartmann, O. wrote: > On recent 12-STABLE, 12.1-RELENG and 12.2-RELENG I face a very nasty problem which > occured a while ago after it seemed to have vanished for a while: running ssh in a xterm > on FreeBSD boxes as mentioned at the beginning ends up very rapidly in a lost connection > with > > # Fssh_packet_write_wait: Connection to XXX.XXX.XXX.XXX port 22: Broken pipe > > The backend is in most cases a CURRENT, 12.1-RELENG or 12.2-RELENG or 12-STABLE server. A > couple of months ago we moved from 11.3-RELENG to 12.1-RELENG (server side, clients were > always 13-CURRENT or 12-STABLE). With FreeBSD 11 as the backend, those broken pipes > occured, but not that frequent and rapid as it is the fact now. > > The "problem" can be mitigated somehow: running top or using the console prevents the > broken pipe fault for a while, but it still occurs. Running "screen" (port > sysutils/screen) does extend the usability of the console for a significant timespan, but > the broken pipe also occurs randomly, but it takes a significant time to occur. So, I do a LOT of ssh-in-xterm and I can't say that I've seen anything that looks like it is FreeBSD's fault (vs ISP, work firewall, work VPN, etc). For my cloud host (12.2-p2) I do tend to use the screen program. At work, in pre- Covid times (so up to last March 18th or so, whatever that works out to in versioning/revisions; probably 12.1 or 12.0), I'd have sessions opened a week+. At home I'm all 13 at the moment. Because I'm running a lot of 13 at home (and before that, 12-stable) I tend to reboot the box for update reasons. Is it safe to assume that "very rapidly" is measured in sub-days? > My conclusion is: either there is a serious problem with FreeBSD since 12, or there is a > config issue I'm not aware of, even with "vanilla" installations from official repository > running unchanged. At work, my problems are all about crappy firewalls. Even firewalls that we've spent a LOT of money on (PaloAlto, the Juniper before it). In all fairness to them, we're running a University's worth of class-B through there and they have all the state-tracking/deep-inspection goodness turned on trying to protect everyone from the big bad internet so it's complicated. With putty, I've had to turn on TCP/IP keepalives and sending null packets. The problem there just seems to be that the firewall hardware can only track so many sessions and, when you stress it, it'll drop "idle" sessions (vs active, vs not opening up a new one). Systems hemorrhage connections all the time when something eats the final connection-close packet, but they can time the thing out. The PaloAlto in my case doesn't know that so it just starts reaping, getting valid idle connections some of the time. So all my tricks just involve some amount of traffic to keep that session more alive in the non-host-state-tracker's brain. For SSH at work, I've set this up: host * TCPKeepAlive yes ServerAliveInterval 60 ServerAliveCountMax 3 So, send TCP/IP keepalive packets, send some traffic every 60 seconds, and tear down the session if you miss 3 of those. I'll note at home that I haven't had to do that. For that cloud 12.2 system, I've had a connection "idle" for 21 hours (but running with a screen going, which is getting some amount of bidirectional traffic going because it has a date/time stamp that gets updated once a minute). Is 21 hours "significant" by your measurements? At home, I don't have a network firewall of any sort. Probably the usual unknowns with the ISP and crappyware NAT box they force me to use. My cloud system is running on DigitalOcean, for what that is worth. I'm not sure what they're doing for firewalls (I'm doing host firewalls out there, so maybe nothing in my case).Received on Wed Dec 30 2020 - 14:41:26 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:26 UTC