[CURRENT]: Broken ssh: Fssh_packet_write_wait: Connection to XXX.XXX.XXX.XXX port 22: Broken pipe

From: O. Hartmann <ohartman_at_zedat.fu-berlin.de>
Date: Thu, 12 May 2016 18:46:34 +0200
Since a couple of time now (~1 1/2 months) I'm bothered by very unreliable ssh
connections betwwwn CURRENT boxes. Very often, the connection simply dies with

Fssh_packet_write_wait: Connection to  XXX.XXX.XXX.XXX port 22: Broken pipe

This is even worse than annoying, how to maintain systems remotely with such unreliable
connections?

The problem seems to be related to CURRENT, but I do not have any truthfull reference
since we use only one 10.3-STABLE box.

I will describe my observations, hopefully someone can make a picture out of it. 

The "Broken pipe" which kills poudriere sessions, buildworld (worse, if a installworld
gets caught by the Broken pipe!) are between CURRENT systems, the "controling" box is a
CURRENT box with X11/xterm from which I start the ssh sesseion.

Connections from such X11/xterm systems no remote servers seem to be "stable" as long as
I do not open a second ssh connection. But this is not much reliable, just an
observation. Sometimes an open ssh connection lasts tens of minutes, even with some
"noise" (output) on the terminal or relaxed (static blinking cursor awaiting
further input), but in other cases, a connections dies very quickly. It seems to me that
this behaviour is random. It occurs under load or on relaxed systems randomly, sometimes
very quick, sometimes it lasts longer. The observation of today about the single-ssh
connection is weak, but I have a strange suspicion that concurrent sessions trigger the
drops faster. In any case, the ssh session seems to go "asleep" after a while: that
happens randomly over a time or very quickly - I have no clue what triggers this erratic
behaviour. It takes a while before the ssh connection/xterm takes input again - up to 30
seconds (even on fast, relaxed systems) or as final consequence, a "Broken pipe".

Today, I made another experience. Having some autofs mounts on several systems,
performance/bandwith seemed very bad/slow (both server and clients are CURRENT, most
recent builds as of today).

I reported earlier on this list about shaky and slow performance in conjunction with the
ssh problem, but I wasn't able to figure out what causes the problem! And I'm wondering
about nobody else is facing such dramatic dropouts of the ssh connections or performance
issues.

I think I will issue a PR on this, too.

Kind regards,

O. Hartmann

Received on Thu May 12 2016 - 14:44:55 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:04 UTC