[bmake] bmake sigint handling causing tty corruption

From: Dmitry Marakasov <amdmi3_at_amdmi3.ru>
Date: Tue, 18 Jul 2017 23:57:00 +0300
Hi!

Me and Ilya Arkhipov were investigating the cause of this bug:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=215572

In short, when FreeBSD ports options dialog is interrupted by Ctrl+C,
there's chance of sporadic terminal corruption. They are not always
reproducible and seem to be dependent on a machine, shell, terminal,
tmux used, but are not tied to any specific configuration.

The investigation led us to the following conclusion:

- the corruption is caused by dialog4ports program (which handles ports
  options dialogs) not being able to restore terminal state on exit
- dialog4ports does indeed try to restore terminal state, but the
  corresponding ioctl (TIOCSETAW) fails with EIO
- examining kern/tty.c suggests that this happens likely because the make
  which is the session leader or something dies before dialog4ports
- which led us to bmake as a culprit

Here's the ktrace of the problem (the process hierarchy here is make ->
sh -> dialog4ports)

---
78337 dialog4ports CALL  sigaction(SIGTSTP,0x800a80228,0)
78337 dialog4ports RET   sigaction 0
78337 dialog4ports CALL  clock_gettime(0xd,0x7fffffffde08)
78337 dialog4ports RET   clock_gettime 0
78337 dialog4ports CALL  gettimeofday(0x7fffffffdc90,0)
78337 dialog4ports RET   gettimeofday 0
78337 dialog4ports CALL  poll(0x7fffffffdca0,0x2,0xffffffff)

(make and sh receive SIGINT first)

78265 make     RET   wait4 RESTART
78335 sh       RET   wait4 -1 errno 4 Interrupted system call
78265 make     PSIG  SIGINT caught handler=0x402530 mask=0x0 code=SI_KERNEL
78335 sh       PSIG  SIGINT caught handler=0x41b950 mask=0x0 code=SI_KERNEL
78265 make     CALL  lstat(0x800ab9900,0x7fffffffd1f0)
78265 make     NAMI  "do-config"
78335 sh       CALL  sigreturn(0x7fffffffd280)
78335 sh       RET   sigreturn JUSTRETURN
78265 make     RET   lstat -1 errno 2 No such file or directory
78335 sh       CALL  wait4(0xffffffff,0x7fffffffd6ec,0,0)
78265 make     CALL  sigaction(SIGINT,0x7fffffffd250,0x7fffffffd230)
78265 make     RET   sigaction 0
78265 make     CALL  kill(0x131b9,SIGINT)
78265 make     RET   kill 0
78265 make     CALL  sigreturn(0x7fffffffd2d0)
78265 make     RET   sigreturn JUSTRETURN

(make kills itself)

78265 make     PSIG  SIGINT SIG_DFL code=SI_USER

(dialog4ports finally starts to process the signal)

78337 dialog4ports RET   poll -1 errno 4 Interrupted system call
78337 dialog4ports PSIG  SIGINT caught handler=0x800855e00 mask=0x0 code=SI_KERNEL
78337 dialog4ports CALL  sigaction(SIGINT,0x7fffffffd7c0,0)
78337 dialog4ports RET   sigaction 0
78337 dialog4ports CALL  ioctl(0x1,TIOCGETA,0x7fffffffd770)
78337 dialog4ports RET   ioctl 0
78337 dialog4ports CALL  write(0x1,0x801676a00,0x17)
78337 dialog4ports GIO   fd 1 wrote 23 bytes
78337 dialog4ports RET   write 23/0x17

(this call should restore terminal state, but it fails)

78337 dialog4ports CALL  ioctl(0x1,TIOCSETAW,0x80161604c)
78337 dialog4ports RET   ioctl -1 errno 5 Input/output error
78337 dialog4ports CALL  exit(0x1)
---

Here's the ktrace of the case which didn't cause terminal corruption:

---
79506 dialog4ports CALL  poll(0x7fffffffdc00,0x2,0xffffffff)
79506 dialog4ports RET   poll -1 errno 4 Interrupted system call

(dialog4ports is lucky enough to start processing the signal before make)

79506 dialog4ports PSIG  SIGINT caught handler=0x800855e00 mask=0x0 code=SI_KERNEL
79506 dialog4ports CALL  sigaction(SIGINT,0x7fffffffd720,0)
79506 dialog4ports RET   sigaction 0
79506 dialog4ports CALL  ioctl(0x1,TIOCGETA,0x7fffffffd6d0)
79506 dialog4ports RET   ioctl 0
79506 dialog4ports CALL  write(0x1,0x801676a00,0x17)
79506 dialog4ports GIO   fd 1 wrote 23 bytes
79506 dialog4ports RET   write 23/0x17

(and cleanup succeeds)

79506 dialog4ports CALL  ioctl(0x1,TIOCSETAW,0x80161604c)
79506 dialog4ports RET   ioctl 0
79506 dialog4ports CALL  exit(0x1)
79433 make     RET   wait4 RESTART
79433 make     PSIG  SIGINT caught handler=0x402530 mask=0x0 code=SI_KERNEL
79433 make     CALL  lstat(0x800ab4980,0x7fffffffd140)
79433 make     NAMI  "do-config"
79433 make     RET   lstat -1 errno 2 No such file or directory
79433 make     CALL  sigaction(SIGINT,0x7fffffffd1a0,0x7fffffffd180)
79433 make     RET   sigaction 0
79433 make     CALL  kill(0x13649,SIGINT)
79433 make     RET   kill 0
79433 make     CALL  sigreturn(0x7fffffffd220)
79433 make     RET   sigreturn JUSTRETURN
79433 make     PSIG  SIGINT SIG_DFL code=SI_USER
79504 sh       RET   wait4 -1 errno 4 Interrupted system call
79504 sh       PSIG  SIGINT caught handler=0x41b950 mask=0x0 code=SI_KERNEL
79504 sh       CALL  sigreturn(0x7fffffffd1d0)
79504 sh       RET   sigreturn JUSTRETURN
---

For reference, here's the program which demonstrates the tty layer
behaviour which causes this:

---
#include <stdio.h>
#include <sys/ioctl.h>
#include <sys/types.h>
#include <signal.h>
#include <termios.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>

int main() {
	struct termios t;
	int ret;

        // save terminal state
	ret = ioctl(1, TIOCGETA, &t);
	fprintf(stderr, "ioctl(1, TIOCGETA) -> %d / %s\n", ret, strerror(errno));

	pid_t p = fork();
	if (p > 0) {
		// parent would die from SIGTERM early
		kill(getpid(), SIGTERM);
	} else if (p == 0) {
		// child tries to restore terminal state with some delay
		usleep(1000);

                // because parent is dead now, this will fail with EIO
		ret = ioctl(1, TIOCSETAW, &t);
		fprintf(stderr, "ioctl(1, TIOCSETAW) -> %d / %s\n", ret, strerror(errno));
	}

	return 0;
}
---

Now to fix this, I suggest that instead of killing itself, make should
signal all its childs carefully and wait() on them, only then die
itself.

Now after a quick glance at bmake sources it seems like the jobs control
code

https://svnweb.freebsd.org/base/head/contrib/bmake/job.c?revision=317239&view=markup#l2633

does the very same thing that I've just described, however bmake is run
in compat mode by default, and CompatInterrupt does exactly what ktrace
shows - it just kills itself.

https://svnweb.freebsd.org/base/head/contrib/bmake/compat.c?revision=310304&view=markup#l180

So, to fix this problem it seems that CompatInterrupt should be improved
as described above.

Also wanted to ask kib_at_, ian_at_ (as recent committers to tty.c) it this
behavior of tty layer is correct and if it could be improved.

-- 
Dmitry Marakasov   .   55B5 0596 FF1E 8D84 5F56  9510 D35A 80DD F9D2 F77D
amdmi3_at_amdmi3.ru  ..:  jabber: amdmi3_at_jabber.ru      http://amdmi3.ru
Received on Wed Jul 19 2017 - 11:00:21 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:12 UTC