[This report just shows some material for the sendmail SIGSEGV's, based on truss output.] I've returned to using the modern jemalloc because it seems to show problems more, after having caught the earlier reported dhclient example under the older jemalloc context. (Again: jemalloc may be exposing a problem elsewhere.) I had truss monitor sendmail, including following child processes. A child process start and end normally looks like: 963: 3092.707439595 0.000033895 sigprocmask(SIG_SETMASK,{ SIGCHLD },0x0) = 0 (0x0) 2432: 3092.708024959 0.000000000 <new process> 963: 3092.708136462 0.000470115 fork() = 2432 (0x980) 2432: 3092.708441039 0.000033319 thr_self(0x50120000) = 0 (0x0) . . . 2432: 3092.717283761 0.000036008 sigaction(SIGQUIT,{ SIG_IGN SA_RESTART ss_t },{ SIG_IGN SA_RESTART ss_t }) = 0 (0x0) 2432: 3092.717544288 0.000034352 sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0) 2432: 3092.717799894 0.000035768 close(0) = 0 (0x0) 2432: 3092.718174733 0.000103726 openat(AT_FDCWD,"/dev/null",O_RDONLY,00) = 0 (0x0) 2432: 3092.718480437 0.000052091 openat(AT_FDCWD,"/dev/null",O_WRONLY,00) = 4 (0x4) 2432: 3092.718778028 0.000037856 dup2(4,1) = 1 (0x1) 2432: 3092.719003051 0.000034255 dup2(4,2) = 2 (0x2) 2432: 3092.719225122 0.000033655 close(4) = 0 (0x0) 2432: 3092.719437735 0.000047626 fstat(0,{ mode=crw-rw-rw- ,inode=22,size=0,blksize=4096 }) = 0 (0x0) 2432: 3092.719679274 0.000037400 fstat(1,{ mode=crw-rw-rw- ,inode=22,size=0,blksize=4096 }) = 0 (0x0) 2432: 3092.719908859 0.000035816 fstat(2,{ mode=crw-rw-rw- ,inode=22,size=0,blksize=4096 }) = 0 (0x0) 2432: 3092.720138299 0.000033727 clock_gettime(13,{ 1588570204.000000000 }) = 0 (0x0) 2432: 3092.720658945 0.000035360 getpid() = 2432 (0x980) 2432: 3092.720931594 0.000048730 __sysctl("kern.proc.args.2432",4,0x0,0x0,0x508ae000,49) = 0 (0x0) 2432: 3092.721572338 0.000062318 open(".",O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC,00) = 4 (0x4) 2432: 3092.721962132 0.000037808 fcntl(4,F_ISUNIONSTACK,0x0) = 0 (0x0) 2432: 3092.722323792 0.000098613 getdirentries(4,"\0\0\0\0\0\t\M-I\M^Q\0\0\0\0\0\0"...,4096,{ 0x0 }) = 144 (0x90) 2432: 3092.722944875 0.000036944 getdirentries(4,0x50859000,4096,{ 0x200 }) = 0 (0x0) 2432: 3092.723294461 0.000045250 close(4) = 0 (0x0) 2432: 3092.723576400 0.000041144 clock_gettime(13,{ 1588570204.000000000 }) = 0 (0x0) 2432: 3092.723828718 0.000037928 setitimer(0,{ 0.000000, 0.000000 },0x0) = 0 (0x0) 2432: 3092.724092245 0.000035815 sigprocmask(SIG_UNBLOCK,{ SIGALRM },{ }) = 0 (0x0) 2432: 3092.724591527 0.000038024 getpid() = 2432 (0x980) 2432: 3092.724952323 0.000052955 setuid(0x0) ERR#1 'Operation not permitted' 2432: 3092.727852159 0.000042969 clock_gettime(4,{ 21633.960374942 }) = 0 (0x0) 2432: 3092.728508193 0.000034327 clock_gettime(4,{ 21633.961033929 }) = 0 (0x0) 2432: 3092.729146608 0.000036872 clock_gettime(4,{ 21633.961670903 }) = 0 (0x0) 2432: 3092.729824967 0.000038360 clock_gettime(4,{ 21633.962349071 }) = 0 (0x0) 2432: 3092.732435446 0.001634793 exit(0x0) 2432: 3092.732555855 0.001755202 process exit, rval = 0 963: 3092.732638865 0.018571063 SIGNAL 20 (SIGCHLD) code=CLD_EXITED pid=2432 uid=25 status=0 963: 3092.732822864 0.018755062 sigsuspend({ }) ERR#4 'Interrupted system call' 963: 3092.733076525 0.000039968 sigprocmask(SIG_SETMASK,{ SIGCHLD },0x0) = 0 (0x0) 963: 3092.733447788 0.000076601 wait4(-1,{ EXITED,val=0 },WNOHANG,0x0) = 2432 (0x980) 963: 3092.733781410 0.000034783 wait4(-1,0xffffbe60,WNOHANG,0x0) ERR#10 'No child processes' 963: 3092.734065366 0.000037328 sigreturn(0xffffbe90) EJUSTRETURN 963: 3092.734263408 0.000033295 sigprocmask(SIG_BLOCK,0x0,{ }) = 0 (0x0) (No activity in 963 for about 1800 seconds.) But once it starts failing, the SIGSEGV happens just after the: setuid(0x0) ERR#1 'Operation not permitted' For example: . . . 4745: 37293.335510778 0.000035023 clock_gettime(13,{ 1588604405.000000000 }) = 0 (0x0) 4745: 37293.335754863 0.000037593 setitimer(0,{ 0.000000, 0.000000 },0x0) = 0 (0x0) 4745: 37293.336010253 0.000036200 sigprocmask(SIG_UNBLOCK,{ SIGALRM },{ }) = 0 (0x0) 4745: 37293.336488027 0.000033823 getpid() = 4745 (0x1289) 4745: 37293.336836797 0.000051995 setuid(0x0) ERR#1 'Operation not permitted' 4745: 37293.338022675 0.001237873 SIGNAL 11 (SIGSEGV) code=SEGV_MAPERR trapno=768 addr=0x506a11f0 4745: 37293.339546520 0.002761718 process killed, signal = 11 963: 37293.339627249 0.050797919 SIGNAL 20 (SIGCHLD) code=CLD_KILLED pid=4745 uid=25 status=11 963: 37293.339794781 0.050965451 sigsuspend({ }) ERR#4 'Interrupted system call' 963: 37293.340038313 0.000037544 sigprocmask(SIG_SETMASK,{ SIGCHLD },0x0) = 0 (0x0) 963: 37293.340329951 0.000070215 wait4(-1,{ SIGNALED,sig=SIGSEGV },WNOHANG,0x0) = 4745 (0x1289) 963: 37293.340634048 0.000034903 wait4(-1,0xffffbe60,WNOHANG,0x0) ERR#10 'No child processes' 963: 37293.340901417 0.000037615 sigreturn(0xffffbe90) EJUSTRETURN 963: 37293.341090122 0.000033319 sigprocmask(SIG_BLOCK,0x0,{ }) = 0 (0x0) (Another about 1800 seconds.) So it seems that sendmail's SIGSEGV always happens during the winding down of the child process: getting ready to exit. So far, it seems that once it starts happening, it happens for each child process created after that. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)Received on Mon May 04 2020 - 16:16:24 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:23 UTC