tcsh hang on exit

From: Rozhuk Ivan <rozhuk.im_at_gmail.com>
Date: Sun, 31 Jan 2021 12:53:22 +0300
Hi!

http://bsd.pw/scale_slides.pdf
p 28 recomends add to ~/.tcshrc:
set savehist = (99999999 merge lock)

I use this for a while, but some time ago found that after term close csh hangs.
Term close - close xfce4-terminal by X bytton. Exit in terminal - does not trigger issue.


Reproduce:
Add/set in ~/.cshrc: set savehist = (99999999 merge lock)
Open and close terminal with tsch shell.
Run top -aSCI -m io to ensure that csh writes to disk. (13+ only)
~/.history.lock exist, csh process continue running.


Truss show infinite loop:
...
78009: link("/home/rim/.rimwks.local.90dc6","/home/rim/.history.lock") ERR#17 'File exists'
78009: unlink("/home/rim/.rimwks.local.90dc6")	 = 0 (0x0)
78009: sigprocmask(SIG_SETMASK,{ },0x0)		 = 0 (0x0)
78009: nanosleep({ 0.100000000 })		 = 0 (0x0)
78009: sigprocmask(SIG_BLOCK,{ SIGHUP|SIGINT|SIGQUIT|SIGTERM|SIGTSTP|SIGCHLD|SIGTTIN|SIGTTOU },{ }) = 0 (0x0)
78009: __sysctl("kern.hostname",2,0x7fffffffb8a0,0x7fffffffb348,0x0,0) = 0 (0x0)
78009: getpid()					 = 78009 (0x130b9)
78009: openat(AT_FDCWD,"/home/rim/.rimwks.local.b5e4e",O_WRONLY|O_FSYNC|O_CREAT|O_TRUNC|O_EXCL,00) = 0 (0x0)
78009: close(0)					 = 0 (0x0)
78009: link("/home/rim/.rimwks.local.b5e4e","/home/rim/.history.lock") ERR#17 'File exists'
78009: unlink("/home/rim/.rimwks.local.b5e4e")	 = 0 (0x0)
78009: sigprocmask(SIG_SETMASK,{ },0x0)		 = 0 (0x0)
78009: nanosleep({ 0.100000000 })		 = 0 (0x0)
78009: sigprocmask(SIG_BLOCK,{ SIGHUP|SIGINT|SIGQUIT|SIGTERM|SIGTSTP|SIGCHLD|SIGTTIN|SIGTTOU },{ }) = 0 (0x0)
78009: __sysctl("kern.hostname",2,0x7fffffffb8a0,0x7fffffffb348,0x0,0) = 0 (0x0)
78009: getpid()					 = 78009 (0x130b9)
78009: openat(AT_FDCWD,"/home/rim/.rimwks.local.d0437",O_WRONLY|O_FSYNC|O_CREAT|O_TRUNC|O_EXCL,00) = 0 (0x0)
78009: close(0)					 = 0 (0x0)
78009: link("/home/rim/.rimwks.local.d0437","/home/rim/.history.lock") ERR#17 'File exists'
78009: unlink("/home/rim/.rimwks.local.d0437")	 = 0 (0x0)
78009: sigprocmask(SIG_SETMASK,{ },0x0)		 = 0 (0x0)
...


This stop only after I remove empty .history.lock or kill process.

Reprodused on 12.2 and 13. amd64.
On 13 it causes disk writes with ~2mb/sec per csh process, on 12.2 no disk writes.

I have try use "truss -fp PID" to understand what goes wrong, but every time I connect truss before
exit/term close - issue does not happen.

contrib/tcsh/dotlock.c - code looks simple, and I suspect that some race condition in kernel exist
that causes fail at this code:
	if (st.st_nlink != 2) {
then it run first time on process exit, in next time code fails on link - file already exist.
I try to reproduce this in small test app, by copy-paste code from contrib/tcsh/dotlock.c but
no errors happen and st_nlink == 2.

My second version that tcsh do something unusual with signal handling and truss revert it back.


1. Does some one expect same issue/can reproduse it?

2. I have mounted ".eli on /home/rim (ufs, local, noatime, nosuid, soft-updates)", is it OK that
FreeBSD 13 write/flush to often/on every file create/delete and FreeBSD 12.2 does not?

3. If I do not use lock option then no chs hang on exit, but if terminal closed without exit
file .history.1583640 (diff nums) created. Is any work around for this?
Received on Sun Jan 31 2021 - 08:53:28 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:27 UTC