Re: [bmake] bmake sigint handling causing tty corruption

From: Ilya A. Arkhipov <rum1cro_at_yandex.ru>
Date: Tue, 25 Jul 2017 20:27:12 +0300
19.07.2017, 16:00, "Dmitry Marakasov" <amdmi3_at_amdmi3.ru>:
> Hi!
>
> Me and Ilya Arkhipov were investigating the cause of this bug:
>
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=215572
>
> In short, when FreeBSD ports options dialog is interrupted by Ctrl+C,
> there's chance of sporadic terminal corruption. They are not always
> reproducible and seem to be dependent on a machine, shell, terminal,
> tmux used, but are not tied to any specific configuration.
>
> The investigation led us to the following conclusion:
>
> - the corruption is caused by dialog4ports program (which handles ports
>   options dialogs) not being able to restore terminal state on exit
> - dialog4ports does indeed try to restore terminal state, but the
>   corresponding ioctl (TIOCSETAW) fails with EIO
> - examining kern/tty.c suggests that this happens likely because the make
>   which is the session leader or something dies before dialog4ports
> - which led us to bmake as a culprit
>
> Here's the ktrace of the problem (the process hierarchy here is make ->
> sh -> dialog4ports)
>
> ---
> 78337 dialog4ports CALL sigaction(SIGTSTP,0x800a80228,0)
> 78337 dialog4ports RET sigaction 0
> 78337 dialog4ports CALL clock_gettime(0xd,0x7fffffffde08)
> 78337 dialog4ports RET clock_gettime 0
> 78337 dialog4ports CALL gettimeofday(0x7fffffffdc90,0)
> 78337 dialog4ports RET gettimeofday 0
> 78337 dialog4ports CALL poll(0x7fffffffdca0,0x2,0xffffffff)
>
> (make and sh receive SIGINT first)
>
> 78265 make RET wait4 RESTART
> 78335 sh RET wait4 -1 errno 4 Interrupted system call
> 78265 make PSIG SIGINT caught handler=0x402530 mask=0x0 code=SI_KERNEL
> 78335 sh PSIG SIGINT caught handler=0x41b950 mask=0x0 code=SI_KERNEL
> 78265 make CALL lstat(0x800ab9900,0x7fffffffd1f0)
> 78265 make NAMI "do-config"
> 78335 sh CALL sigreturn(0x7fffffffd280)
> 78335 sh RET sigreturn JUSTRETURN
> 78265 make RET lstat -1 errno 2 No such file or directory
> 78335 sh CALL wait4(0xffffffff,0x7fffffffd6ec,0,0)
> 78265 make CALL sigaction(SIGINT,0x7fffffffd250,0x7fffffffd230)
> 78265 make RET sigaction 0
> 78265 make CALL kill(0x131b9,SIGINT)
> 78265 make RET kill 0
> 78265 make CALL sigreturn(0x7fffffffd2d0)
> 78265 make RET sigreturn JUSTRETURN
>
> (make kills itself)
>
> 78265 make PSIG SIGINT SIG_DFL code=SI_USER
>
> (dialog4ports finally starts to process the signal)
>
> 78337 dialog4ports RET poll -1 errno 4 Interrupted system call
> 78337 dialog4ports PSIG SIGINT caught handler=0x800855e00 mask=0x0 code=SI_KERNEL
> 78337 dialog4ports CALL sigaction(SIGINT,0x7fffffffd7c0,0)
> 78337 dialog4ports RET sigaction 0
> 78337 dialog4ports CALL ioctl(0x1,TIOCGETA,0x7fffffffd770)
> 78337 dialog4ports RET ioctl 0
> 78337 dialog4ports CALL write(0x1,0x801676a00,0x17)
> 78337 dialog4ports GIO fd 1 wrote 23 bytes
> 78337 dialog4ports RET write 23/0x17
>
> (this call should restore terminal state, but it fails)
>
> 78337 dialog4ports CALL ioctl(0x1,TIOCSETAW,0x80161604c)
> 78337 dialog4ports RET ioctl -1 errno 5 Input/output error
> 78337 dialog4ports CALL exit(0x1)
> ---
>
> Here's the ktrace of the case which didn't cause terminal corruption:
>
> ---
> 79506 dialog4ports CALL poll(0x7fffffffdc00,0x2,0xffffffff)
> 79506 dialog4ports RET poll -1 errno 4 Interrupted system call
>
> (dialog4ports is lucky enough to start processing the signal before make)
>
> 79506 dialog4ports PSIG SIGINT caught handler=0x800855e00 mask=0x0 code=SI_KERNEL
> 79506 dialog4ports CALL sigaction(SIGINT,0x7fffffffd720,0)
> 79506 dialog4ports RET sigaction 0
> 79506 dialog4ports CALL ioctl(0x1,TIOCGETA,0x7fffffffd6d0)
> 79506 dialog4ports RET ioctl 0
> 79506 dialog4ports CALL write(0x1,0x801676a00,0x17)
> 79506 dialog4ports GIO fd 1 wrote 23 bytes
> 79506 dialog4ports RET write 23/0x17
>
> (and cleanup succeeds)
>
> 79506 dialog4ports CALL ioctl(0x1,TIOCSETAW,0x80161604c)
> 79506 dialog4ports RET ioctl 0
> 79506 dialog4ports CALL exit(0x1)
> 79433 make RET wait4 RESTART
> 79433 make PSIG SIGINT caught handler=0x402530 mask=0x0 code=SI_KERNEL
> 79433 make CALL lstat(0x800ab4980,0x7fffffffd140)
> 79433 make NAMI "do-config"
> 79433 make RET lstat -1 errno 2 No such file or directory
> 79433 make CALL sigaction(SIGINT,0x7fffffffd1a0,0x7fffffffd180)
> 79433 make RET sigaction 0
> 79433 make CALL kill(0x13649,SIGINT)
> 79433 make RET kill 0
> 79433 make CALL sigreturn(0x7fffffffd220)
> 79433 make RET sigreturn JUSTRETURN
> 79433 make PSIG SIGINT SIG_DFL code=SI_USER
> 79504 sh RET wait4 -1 errno 4 Interrupted system call
> 79504 sh PSIG SIGINT caught handler=0x41b950 mask=0x0 code=SI_KERNEL
> 79504 sh CALL sigreturn(0x7fffffffd1d0)
> 79504 sh RET sigreturn JUSTRETURN
> ---
>
> For reference, here's the program which demonstrates the tty layer
> behaviour which causes this:
>
> ---
> #include <stdio.h>
> #include <sys/ioctl.h>
> #include <sys/types.h>
> #include <signal.h>
> #include <termios.h>
> #include <unistd.h>
> #include <errno.h>
> #include <string.h>
>
> int main() {
>         struct termios t;
>         int ret;
>
>         // save terminal state
>         ret = ioctl(1, TIOCGETA, &t);
>         fprintf(stderr, "ioctl(1, TIOCGETA) -> %d / %s\n", ret, strerror(errno));
>
>         pid_t p = fork();
>         if (p > 0) {
>                 // parent would die from SIGTERM early
>                 kill(getpid(), SIGTERM);
>         } else if (p == 0) {
>                 // child tries to restore terminal state with some delay
>                 usleep(1000);
>
>                 // because parent is dead now, this will fail with EIO
>                 ret = ioctl(1, TIOCSETAW, &t);
>                 fprintf(stderr, "ioctl(1, TIOCSETAW) -> %d / %s\n", ret, strerror(errno));
>         }
>
>         return 0;
> }
> ---
>
> Now to fix this, I suggest that instead of killing itself, make should
> signal all its childs carefully and wait() on them, only then die
> itself.
>
> Now after a quick glance at bmake sources it seems like the jobs control
> code
>
> https://svnweb.freebsd.org/base/head/contrib/bmake/job.c?revision=317239&view=markup#l2633
>
> does the very same thing that I've just described, however bmake is run
> in compat mode by default, and CompatInterrupt does exactly what ktrace
> shows - it just kills itself.
>
> https://svnweb.freebsd.org/base/head/contrib/bmake/compat.c?revision=310304&view=markup#l180
>
> So, to fix this problem it seems that CompatInterrupt should be improved
> as described above.
>
> Also wanted to ask kib_at_, ian_at_ (as recent committers to tty.c) it this
> behavior of tty layer is correct and if it could be improved.
>
> --
> Dmitry Marakasov . 55B5 0596 FF1E 8D84 5F56 9510 D35A 80DD F9D2 F77D
> amdmi3_at_amdmi3.ru ..: jabber: amdmi3_at_jabber.ru http://amdmi3.ru

FYI: it issue was fixed https://svnweb.freebsd.org/base?view=revision&revision=321410

-- 
With Best Regards,
Ilya A. Arkhipov
Received on Tue Jul 25 2017 - 15:27:23 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:12 UTC