Someone help me understand this...?

From: Joe Greco <jgreco_at_ns.sol.net> Date: Wed, 27 Aug 2003 09:08:42 -0500 (CDT) · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:20 UTC

I've got a weirdness with kill(2).

This code is out of Diablo, the news package, and has been working fine for
some years.  It apparently works fine on other OS's.

In the Diablo model, the parent process may choose to tell its children to
update status via a signal.  The loop basically consists of going through
and issuing a SIGALRM.

This stopped working a while ago, don't know precisely when.  I was in the
process of debugging it today and ran into this.

The specific OS below is 5.1-RELEASE but apparently this happens on 4.8 as
well.

%echo $$
29047
%ps -O ruid,uid | egrep '28949|29045|29047'
28949     8     8  p0  I      0:00.00 diablo: ihav=0    chk=0    rec=0 ent=0
29045     8     8  p0  I      0:00.00 sleep 999999
29047     8     8  p0  D      0:00.01 -su (csh)
%kill -ALRM 28949
28949: Operation not permitted
%kill -ALRM 29045
%ps -O ruid,uid | egrep '28949|29045'
28949     8     8  p0  I      0:00.00 diablo: ihav=0    chk=0    rec=0 ent=0
%

Wot?  Why can't I send it a signal?

I've read kill(2) rather carefully and cannot find the reason.  It says,

     For a process to have permission to send a signal to a process designated
     by pid, the real or effective user ID of the receiving process must match
     that of the sending process or the user must have appropriate privileges
     (such as given by a set-user-ID program or the user is the super-user).

Well, the sending and receiving processes both clearly have equal uid/euid.

We're not running in a jail, so I don't expect any issues there.

The parent process did actually start as root and then shed privilege with

        struct passwd *pw = getpwnam("news");
        struct group *gr = getgrnam("news");
        gid_t gid;

        if (pw == NULL) {
            perror("getpwnam('news')");
            exit(1);
        }
        if (gr == NULL) {
            perror("getgrnam('news')");
            exit(1);
        }
        gid = gr->gr_gid;
        setgroups(1, &gid);
        setgid(gr->gr_gid);
        setuid(pw->pw_uid);

so that looks all well and fine...  so why can't it kill its own children,
and why can't I kill one of its children from a shell with equivalent 
uid/euid?

I know there's been some paranoia about signal delivery and all that, but
my searching hasn't turned up anything that would explain this.  Certainly
the manual page ought to be updated if this is a new expected behaviour or
something...  at least some clue as to why it might fail would be helpful.

... JG
-- 
Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net
"We call it the 'one bite at the apple' rule. Give me one chance [and] then I
won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN)
With 24 million small businesses in the US alone, that's way too many apples.