Bizarre network+ssh+bash shell behaviour

From: Jeremy Chadwick <freebsd_at_jdc.parodius.com>
Date: Fri, 8 Oct 2004 20:19:04 -0700
Something that's been on my "report this with a big fat bold 'WTF'
around it" list for a few weeks now:

On one of my boxes -- and only that box -- hitting EOF (^D) in bash
to log out results in no printed "logout" message.  The socket does
get closed cleanly, and the pty/tty does get put back into the
"allocatable pty/tty pool" (sorry, not familiar with this portion).
However, normally when logging out of a shell, the time between logout
and the time between the actual socket being dropped is minimal (i.e.
immediately); in the case of the "weird box", there is a good 1-2
second delay before the actual socket is dropped.

I'm logged in via ssh across a LAN consisting of two unmanaged switches
(one via a WRT54GS, and the other via a Hawking Technologies 100mbit
unit); no duplex problems or auto-neg problems, no errors, overruns,
underruns, runts, cabling problems, etc. etc..  SSH client is PuTTY, and
the client system is Windows XP.

Running a bash sub-shell underneath bash, and hitting EOF in the
sub-shell, results in "logout" being printed as expected.

To throw even more craziness into the loop: it seems that I can get
"logout" to be printed in the situation where I change the bash binary
while I'm logged in via ssh.  Meaning: log in (using bash 3.0), use
sudo, build bash 2.05b from ports, pkg_delete bash 3.0, install 2.05b,
type 'exit' to get out of sudo, and hit EOF -- this, magically, causes
"logout" to be printed (but that time and that time ALONE -- if I log
back in again and hit EOF, no more "logout" -- rinse lather repeat)

Changing my shell to csh/tcsh works fine on the "weird box" -- EOF prints
"logout" as expected.  Removing my .bashrc and .bash_profile, and even
/etc/profile, results in no change.

I'm building bash 2.05b or bash 3.0 from ports.  The only thing
"different" about my make.conf is that I specify CPUTYPE=p4; the
same thing happens even when the CPUTYPE is removed/commented out.

This does not happen on any of my other boxes, but those boxes are
online in my co-lo cage.  The hardware between boxes is both similar
and different; same CPU (P4 2.4GHz Northwood), same chipset (i875P), same
RAM brand and model (Corsair XMS DDR400), but different motherboard
manufacturers (SB75 from Shuttle ("weird box"), SuperMicro P4SCE (in the
co-lo)), different on-board NICs: on-board Broadcom / bge ("weird box")
and on-board Intel / em (co-lo).  Both systems use the 4BSD scheduler.
No login.conf changes have been made on either system.  No sysctl.conf
tweaks.  Kernel configurations are identical, aside from the network
driver differential; SMP is enabled on both systems, ditto with HTT --
disabling either of these makes no difference.

Using 4.x on the "weird box" results in proper behaviour of bash,
implying there might be something specific to the 5.3 series.

I'm curious if anyone else can confirm this behaviour, and if not,
where and how I would begin tracing this to find out what's going on.
It's fairly strange... the only thing I can think of is some sort-of
odd network driver or network-related issue, or possibly some freak
problem with the pty/tty code.

Thoughts/ideas are quite welcome.

-- 
| Jeremy Chadwick                                 jdc at parodius.com |
| Parodius Networking                        http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, USA |
| Making life hard for others since 1977.                             |
Received on Sat Oct 09 2004 - 01:19:05 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:16 UTC