Re: Someone help me understand this...?

From: Robert Watson <rwatson_at_freebsd.org>
Date: Thu, 28 Aug 2003 11:34:09 -0400 (EDT)
On Thu, 28 Aug 2003, Joe Greco wrote:

> > On Wed, 27 Aug 2003, Joe Greco wrote:
> > > The specific OS below is 5.1-RELEASE but apparently this happens on 4.8
> > > as well. 
> > 
> > Could you confim this happens with 4.8?  The access control checks there
> > are substantially different, and I wouldn't expect the behavior you're
> > seeing on 4.8...
> 
> Rather difficult.  I'll see if the client will let me trash a production
> system, but usually people don't like $40K servers handing out a few
> hundred megabits of traffic going out of service.  We were trying to fix
> it on the scratch box (which happens to have 5.1R on it) and then were
> going to see how it fared on the production systems. 

I think it's safe to assume that if you're seeing a similar failure,
there's a different source given my reading of the code, but I'm willing
to be proven wrong.  It's probably not worth the investment if you're
talking about large quantities of money, though.

> > Clearly, unbreaking applications like Diablo by default is desirable.  At
> > least OpenBSD has similar protections to these turned on by default, and
> > possibly other systems as well.  As 5.x sees more broad use, we may well
> > bump into other cases where applications have similar behavior: they rely
> > on no special protections once they've given up privilege.  I wonder if
> > Diablo can run unmodified on OpenBSD; it could be they don't include
> > SIGALRM on the list of "protect against" signals, or it could be that they
> > modify Diablo for their environment to use an alternative signaling
> > mechanism.  Another alternative to this patch would simply be to add
> > SIGARLM to the list of acceptable signals to deliver in the
> > privilege-change case.
> 
> I wonder if it would be reasonable to have some sort of interface that
> allowed a program to tell FreeBSD not to set this flag...  if not, at
> least if there was a sysctl, code could be added so that the daemon
> checked the flag when starting and errored out if it wasn't set. 

We actually have such an interface, but it's only enabled for the purposes
of regression testing.  If you compile "options REGRESSION" into the
kernel configuration, a new system call __setsugid(), is exposed to
applications.  It's used by src/tools/regression/security/proc_to_proc to
make it easier to set up process pairs for regression testing of
inter-process access control.  When I added it, there was some interest in
just making it setsugid() and exposing it to all processes.  Maybe we
should just go this route for 5.2-RELEASE.  Invoking it with a (0)
argument would mean the application writer accepted the inherrent risks.

However, this would open the application to the risks of debugging
attachment, which are probably greater than the signal risks in most
cases.  It's not clear what the best way to express "I want to accept
<these risks> but not <those risks>" would be...  So far, it sounds like
we have three work-arounds in the pot, perhaps we can think of something
better:

(1) Remove SIGALRM from the list of prohibited signals in the P_SUGID
    case.  Not clear what the risks are here based on common application
    use, but this is an easy change to make.

(2) Add setsugid() to allow applications to give up implicit protections
    associated with credential changes.  This comes with greater risks, I
    suspect, since it opens up applications to more explicit
    vulnerabilities:  signal attacks require more sophistication and luck,
    but debugging attacks are "easy".

(3) Allow administrators to selectively disable the more restrictive
    signal checks at a system scope using a sysctl.  This is easy, and
    comes with no risks as long as the setting is unchanged (the default
    in the patch I sent out earlier). 

I'm tempted to commit (1) immediately to allow a workaround if we get
nothing else figured out, and to think some more about (2) and (3).
Another possibility would be to encourage application writers to avoid
overloading signals that already have "meanings", and rely on the USR
signals.  I assume the reason Diablo uses ALRM is that the USR signals
already have assigned semantics?

> > BTW, it's worth noting that the mechanism Diablo is using to give up
> > privilege actually does retain some "privileges" -- it doesn't, for
> > example, synchronize its resource limits with those of the user it is
> > switching to, so it retains the starting resource limits (likely those of
> > the root account). 
> 
> That's actually preferred in most cases.  News servers almost always eat
> far more resources than whatever limits you might set by default, which
> just turns into telling people to remove the limits or use root's
> limits.  Generally if a news package bumps limits bad things happen. 

Right now, most applications in the base system make use of the
setusercontext() call to modify their protections as part of a switch of
users.  They often pass in the flag LOGIN_SETALL and then remove the bits
they don't need, such as LOGIN_SETRESOURCES.  This also has the side
effect of setting up things like the umask based on the user default in
login.conf, setting the default paths, etc.  This may be overkill for what
you're looking for, though, and there's a lot of value to "if it ain't
broke, don't fix it". 

> > A preferred structuring of privilege separation
> > attempts to avoid this scenario by containing privilege in a process that
> > is as independent as possible from the unprivileged processes, and uses
> > file descriptor passing to get a bound port to the unprivileged processes,
> > rather than credential manipulation which is fairly failure-prone.  
> 
> Yes, and such a thing is actually available, though it introduces some
> new issues, because the daemons can be configured to allow various bound
> ports (needing a variable number of fd's, etc) and this also breaks
> legacy sites where people have custom startup scripts.  Ugh.  We did
> that originally so people could get core dumps on FreeBSD.

Yeah.  The point on application behavior is probably to affect future
application development and changes -- we still need to address current
configurations.

> Yeah, yeah, it's Matt Dillon legacy code.  Matt tended to ignore error
> returns from things where an error was not expected and even if one was
> reported, nothing (beyond a message) could be done.  It actually took me
> a while to isolate the kill issue as a result, because...  the rval from
> kill was being ignored (now the error gets syslog'ed). 

In most cases, fail-stop is a reasonable behavior for unexpected security
behavior from the system, but ignore is likely to shoot you later. :-)  I
tend to wrap even kill() calls as uid 0 in an assertion check, just to be
on the safe side.  If nothing else, it helps detect the case where the
other process has died, and you're using a stale pid.  It's particular
useful if the other process has died, the pid has been reused, and it's
now owned by another user, which is a real-world case where kill() as a
non-0 uid can fail even when you're sure it can't :-). 

Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
robert_at_fledge.watson.org      Network Associates Laboratories
Received on Thu Aug 28 2003 - 06:34:37 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:20 UTC