Re: RFT: Allow large values of NGROUPS_MAX

From: Bruce Evans <brde_at_optusnet.com.au>
Date: Sat, 6 Jun 2009 20:31:51 +1000 (EST)
On Fri, 5 Jun 2009, Brooks Davis wrote:

> I've been working on fixing the limit of 15 groups per user.  This has
> primarily consisted of merging a patch from Isilon Systems which breaks
> the group array out of struct ucred and stores in in malloc'd storage.
>
> I've attached a patch which includes that change and increases the value
> of NGROUPS_MAX to 32767.  It also changes references to cr_groups[0] to
> use the cr_gid macro ...

I don't like this macro.  At least the implementation should keep using
the unobfuscated reference.  kern_prot.c consistently didn't use the
obfuscation except once in a comment.

> Before any merge a couple decisions need to be made:
>
> - How large should NGROUPS_MAX be?  Linux uses 65536 and we could
>   extend things to support that, but it's probably unnecessary.

32767 seems excessive too.  Hopefully NGROUPS[_MAX] is used only in
broken unconverted applications so a large value rarely wastes
space.

> - Should we make any attempt to support old binaries when there
>   are more than 16 groups?

There is no way to support broken old binaries.  They will have a limit
of 16.

>   The POSIX getgroups/setgroups APIs did not
>   anticipate this change and thus either will fail outright.

No, as you explained in previous mail, POSIX anticipated this perfectly
unusably by making NGROUPS_MAX a Runtime Increasable Value despite it being
constant.  This puts the burden of not failing on applications.  Only
broken apps that don't do any of
   dynamic allocation using sysconf(_SC_NGROUPS_MAX), or
   dynamic allocation using getgroups(0, ...), or
   dynamic allocation after failing using getgroups(old_NGROUPS_MAX, ...)
will fail outright.

>   We can't
>   fix setgroups, but we might want to make an optional accommodation for
>   getgroups to allow for truncated returns to old code.

If done unconditionally, this would break the unbroken apps that do
dynamic allocation after failing using getgroups(old_NGROUPS_MAX, ...).
These apps depend on getgroups() failing with the Standard and
documented errno EINVAL when the `gidsetlen' parameter is too small.

Maybe some per-app or per-environment branding could be used to configure
not generating an error.  You would apply it only where security is
not very important.

getgrouplist(3) guarantees filling of the array when the array is too
small.  getgroups(2) doesn't guarantee anything, and in fact doesn't
fill the array.  It wouldn't hurt to fill it.

% ...
% diff -ru --exclude='.glimpse*' --exclude=.svn --exclude=compile --ignore-matching='$FreeBSD' /usr/src/sys/kern/kern_prot.c ngroups/sys/kern/kern_prot.c
% --- /usr/src/sys/kern/kern_prot.c	2009-06-05 15:33:50.000000000 -0500
% +++ ngroups/sys/kern/kern_prot.c	2009-06-05 16:02:28.000000000 -0500
% _at__at_ -243,16 +246,11 _at__at_
% 
%  	td->td_retval[0] = td->td_ucred->cr_rgid;
%  #if defined(COMPAT_43)
% -	td->td_retval[1] = td->td_ucred->cr_groups[0];
% +	td->td_retval[1] = td->td_ucred->cr_gid;
%  #endif
%  	return (0);
%  }
% 
% -/*
% - * Get effective group ID.  The "egid" is groups[0], and could be obtained
% - * via getgroups.  This syscall exists because it is somewhat painful to do
% - * correctly in a library function.
% - */
%  #ifndef _SYS_SYSPROTO_H_
%  struct getegid_args {
%          int     dummy;

This comment still seems to apply.

I wonder if you noticed and/or fixed the related off-by-1 errors in
getgroups() and {NGROUPS_MAX}: POSIX.1-2001-draft7 says:

% 16834           IDs of the calling process. It is implementation-defined whether getgroups( ) also returns the
% 16835           effective group ID in the grouplist array.

FreeBSD does return the egid in the array, always as the first element.
Neither of these details is documented in getgroups.2 or setgroups.2.
Even the implementation doesn't seem to understand this where it is
most important -- the implementation of setgroups() seems to be missing
some of the things done by setegid().  getgrouplist(3) has explicit
support for the egid being in both the initial and final lists (or
missing in the initial list?).

% ...
% 16841           If the effective group ID of the process is returned with the supplementary group IDs, the value
% 16842           returned shall always be greater than or equal to one and less than or equal to the value of
% 16843           {NGROUPS_MAX}+1.

This and probably other things indicate that _supplementary_ actually means
supplementary -- it doesn't count the egid.  But in FreeBSD, the egid is
counted.  This gives 2 off-by-1 errors that partially compensate for each
other:
- NGROUPS_MAX is 15, not 16, since it is impossible to have 16 _supplementary_
   groups in cr_groups[0..15] after using cr_groups[0] for the egid
- getgroups() cannot return {NGROUPS_MAX}+1 as is needed for it to return the
   egid plus {NGROUPS_MAX} supplementary groups
- getgroups() does return {NGROUPS_MAX}+1 once {NGROUPS_MAX} is corrected.

Some userland code depends on these bugs -- the static array size should
be NGROUPS_MAX + 1, but it is typically plain NGROUPS.  OTOH,
libc/gen/initgroups.c uses NGROUPS + 1 and getgrouplist() followed by
setgroups().  I think getgrouplist() can produce the extra 1 and then 
setgroups() and thus initgroups() will fail.

I thought again about making cr_gid separate from the array, so that it
can be named cr_gid without obfuscation.  This wouldn't be good since
it would complicate more than the copyin/out in get/setgroups() -- there
are a few temporary gid arrays, and some iterations over the arrays that
want to see the egid.

Bruce
Received on Sat Jun 06 2009 - 10:48:16 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:49 UTC