Re: 64-bit values in sysctl

From: Bruce Evans <bde_at_zeta.org.au>
Date: Thu, 26 Feb 2004 14:44:18 +1100 (EST)
On Wed, 25 Feb 2004, Bruce Cran wrote:

> After investigating whether FreeBSD will be able to show the correct speed of a
> 4.3GHz CPU (http://www.theinquirer.net/?article=14319), I found that the sysctl

Please use lines shorter than 85 columns.

> infrastructure doesn't currently have support for 64-bit integers, on a 32-bit
> platform at least.   The only type pertaining to 64-bits is CTLTYPE_QUAD, but since
> there's no corresponding sysctl_handle_quad, it's still handled as a 32-bit integer.

The kernel can handle quads (not quite right) using CTLTYPE_OPAQUE.  Support
for reading CTLTYPE_QUAD variables from the kernel in sysctl(8) was broken
long ago (1995?).  I use the following quick fix:

%%%
Index: sysctl.c
===================================================================
RCS file: /home/ncvs/src/sbin/sysctl/sysctl.c,v
retrieving revision 1.59
diff -u -1 -r1.59 sysctl.c
--- sysctl.c	7 Nov 2003 16:41:47 -0000	1.59
+++ sysctl.c	26 Feb 2004 03:03:21 -0000
_at__at_ -599,2 +581,19 _at__at_

+	case 'Q':
+		if (!nflag)
+			printf("%s%s", name, sep);
+		fmt++;
+		val = "";
+		while (len >= sizeof(quad_t)) {
+			fputs(val, stdout);
+			if (*fmt == 'U')
+				printf(hflag ? "%'qu" : "%qu", *(u_quad_t *)p);
+			else
+				printf(hflag ? "%'qd" : "%qd", *(quad_t *)p);
+			val = " ";
+			len -= sizeof(quad_t);
+			p += sizeof(quad_t);
+		}
+		return (0);
+
 	case 'T':
%%%

> I found PR kern/39681 which explains that it's apparently not as simple as adding a
> sysctl_handle_quad function.  With CPUs getting faster and a few people adding more

Arches that support both 32-bit and 64-bit applications with a 64-bit
kernel make this even more interesting.  See peter's recent hacks in
kern_exec.c.  Longs are 64-bit in the kernel but 32-bit in applications,
so CTLTYPE_LONG doesn't do what 32-bit applications expect.  Longs in
data structures just cannot work right, but it should be possible to
return small scalar values using only as many bytes as required without
either the application or the kernel caring about the exact number of
bytes.

> then 4GB RAM to their machines, there's a need for some sort of solution IMO.
> In addition, the only place a CTLTYPE_QUAD is used is in sys/i386/i386/tsc.c, and
> then it mixed CTLTYPE_QUAD and sizeof(u_int).   I've also noticed that the

This mixture is invalid.  It should cause the following error according to
sysctl.3:

     [EINVAL]		A non-null newp is given and its specified length in
			newlen is too large or too small.

but the error handling for this was broken at much the same time that
support for CTLTYPE_QUAD was broken.  Quick fix (with debugging cruft):

%%%
Index: kern_sysctl.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/kern_sysctl.c,v
retrieving revision 1.149
diff -u -1 -r1.149 kern_sysctl.c
--- kern_sysctl.c	22 Feb 2004 12:31:43 -0000	1.149
+++ kern_sysctl.c	23 Feb 2004 03:41:17 -0000
_at__at_ -942,4 +934,11 _at__at_
 		return 0;
-	if (req->newlen - req->newidx < l)
+	if (req->newlen - req->newidx != l) {
+		if (req->newlen - req->newidx > l) {
+			printf(
+		"sysctl_new_kernel -- short write: %zu - %zu > %zu\n",
+			    req->newlen, req->newidx, l);
+			Debugger("sysctl_new_kernel: short write");
+		}
 		return (EINVAL);
+	}
 	bcopy((char *)req->newptr + req->newidx, p, l);
_at__at_ -1058,4 +1055,11 _at__at_
 		return 0;
-	if (req->newlen - req->newidx < l)
+	if (req->newlen - req->newidx != l) {
+		if (req->newlen - req->newidx > l) {
+			printf(
+		"sysctl_new_user -- short write: %zu - %zu > %zu\n",
+			    req->newlen, req->newidx, l);
+			Debugger("sysctl_new_user: short write");
+		}
 		return (EINVAL);
+	}
 	error = copyin((char *)req->newptr + req->newidx, p, l);
%%%

I've used this code since long ago (1995?) except the Debugger() call
was only added 6 months ago to determine if there are any other sysctls
as broken as machdep.tsc_freq.  There don't seem to be any others.
Short reads of sysctl scalar variables just aren't useful, since the
whole variable will need to be read if any of it is, and the read
shouldn't be in pieces since that just gives races.  Short writes are
just bugs, since the races are much larger and the kernel can't do
bounds checking on variables written in pieces.

With the above error handling, the machdep.tsc_freq sysctl just fails.

Quick non-fix:

%%%
Index: tsc.c
===================================================================
RCS file: /home/ncvs/src/sys/i386/i386/tsc.c,v
retrieving revision 1.204
diff -u -1 -r1.204 tsc.c
--- tsc.c	21 Oct 2003 18:28:34 -0000	1.204
+++ tsc.c	26 Feb 2004 03:23:52 -0000
_at__at_ -132,4 +148,4 _at__at_
 {
+	u_int freq;
 	int error;
-	uint64_t freq;

_at__at_ -138,2 +154,4 _at__at_
 	freq = tsc_freq;
+	if (freq != tsc_freq)
+		return (EOVERFLOW);
 	error = sysctl_handle_int(oidp, &freq, sizeof(freq), req);
_at__at_ -146,5 +164,41 _at__at_

-SYSCTL_PROC(_machdep, OID_AUTO, tsc_freq, CTLTYPE_QUAD | CTLFLAG_RW,
+SYSCTL_PROC(_machdep, OID_AUTO, tsc_freq, CTLTYPE_UINT | CTLFLAG_RW,
     0, sizeof(u_int), sysctl_machdep_tsc_freq, "IU", "");

+uint64_t tsc_freq1 = 0x0000000100000002;
+
+static int
+sysctl_machdep_tsc_freq1(SYSCTL_HANDLER_ARGS)
+{
+	uint64_t freq;
+	int error;
+
+	freq = tsc_freq1;
+	error = sysctl_handle_opaque(oidp, &freq, sizeof(freq), req);
+	if (error == 0 && req->newptr != NULL)
+		tsc_freq1 = freq;
+	return (error);
+}
+
+SYSCTL_PROC(_machdep, OID_AUTO, tsc_freq1, CTLTYPE_QUAD | CTLFLAG_RW,
+    0, sizeof(uint64_t), sysctl_machdep_tsc_freq1, "QU", "");
+
+uint64_t tsc_freq2 = 0x0000000100000002;
+
+static int
+sysctl_machdep_tsc_freq2(SYSCTL_HANDLER_ARGS)
+{
+	uint64_t freq;
+	int error;
+
+	freq = tsc_freq2;
+	error = sysctl_handle_opaque(oidp, &freq, sizeof(freq), req);
+	if (error == 0 && req->newptr != NULL)
+		tsc_freq2 = freq;
+	return (error);
+}
+
+SYSCTL_PROC(_machdep, OID_AUTO, tsc_freq2, CTLTYPE_QUAD | CTLFLAG_RW,
+    0, sizeof(uint64_t), sysctl_machdep_tsc_freq2, "IU", "");
+
 static unsigned
%%%

The changes to the machdep.tsc_freq sysctl just fix its bogus parameters
so that it returns 32 bits in a non-bogus way, and adds a check that the
variable is actually representable in 32 bies (so the sysctl should fail
with errno EOVERFLOW if the frequency is 4.3GHz).

sysctl_machdep_tsc_freq1() shows how to implement returning 64-bit values
using CTLTYPE_OPAQUE (machdep.tsc_freq1 sysctl).  It should work for
machdep.tsc_freq after removing all "1"'s in it.  sysctl_machdep_tsc_freq2()
shows another way of returning 64-bit values.  IIRC, its only advantage is
that it sort of works with an unpatched syasctl(8).

> machdep.[i8254_freq, tsc_freq, acpi_timer_freq] are all writable, whereas I think they
> should probably be read-only, as should kern.bootfile.   One last thing I've noticed

No, these should all be writable.  See other replies for what can be done
by changing the frequencies in userland.  Writing to kern.bootfile is
normal for all kernel installs -- kern.post.mk changes kern.bootfile so
that it keeps pointing to the old kernel when the old kernel is renamed.

> during my look into the kernel code is that a lot of the sysctls are defined as signed
> integers - wouldn't it be better if these were converted to be unsigned, so that it

Many are already unsigned.  Perhaps not enough, but unsigned variables should
be used as little as possible since using them breaks overflow handling and
gives a morass of (un)sign extension problems.

> would be impossible for the writable ones to be set to negative values and thus avoid
> possible foot-shooting?  I already have some diffs, so I could put together a PR if
> some of these changes (such as making some sysctls read-only) are valid.

Negative values are often no more preposterous than large ones.

Bruce
Received on Wed Feb 25 2004 - 18:44:27 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:44 UTC