Re: Anyone object to the following change in libc?

From: Terry Lambert <tlambert2_at_mindspring.com> Date: Thu, 30 Oct 2003 02:59:09 -0800 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:27 UTC

Harti Brandt wrote:
> TL>Paragraph 6 of:
> TL>
> TL>     http://www.opengroup.org/onlinepubs/007904975/functions/sscanf.html
> TL>
> TL>Implies that the lack of characters in the string following the
> TL>conversion, due to failure in assignment, should result in an
> TL>"Input failure".  Note also that stdio.h defines EOF as -1.
> 
> I fail to locate this paragraph. This interpretation would also imply
> that scanf() always needs to return -1 whenever it cannot match a format
> specifier.

	The fscanf() functions shall execute each directive of the
	format in turn. If a directive fails, as detailed below, the
	function shall return. Failures are described as input
	failures (due to the unavailability of input bytes) or
	matching failures (due to inappropriate input).

It comes down to how you interpret the NUL byte at the end of the
sscanf() input string.  Is it an EOF?  Or is it an unavailability of
input bytes?  The answer to the question picks which return value
is correct.

> TL>I think it can be interpreted either way, still.
> 
> You miss the section about RETURN VALUE: EOF is return on a read error.
> This is not an input error.

How do I distinguish a "return value is -1 as an error result" from
"return value is -1 as an EOF result"?

> You should also read the very 1st paragraph. This clearly states, that
> ISO is the primary source of information and the ISO text is a lot
> cleaner.

No, that's not what it actually states; here's the paragraph:

	The functionality described on this reference page is
	aligned with the ISO C standard. Any conflict between
	the requirements described here and the ISO C standard
	is unintentional. This volume of IEEE Std 1003.1-2001
	defers to the ISO C standard.

It says that any conflicts are unintentional, and their intent was
to use different language for no good reason, rather than just
copying it verbatim and removing any doubt.  It does *NOT* say
that no conflicts exist.

Also: In this context, which is IEEE 1003.1-2001, Issue 6, "the
ISO C standard" refers to "c89", which is the version of the C
standard that was in effect at the time that SVID IV was defined.

If you need clarification on this issue, you should download the
currently available version of the NIST/PCTS, which specifically
requires you to compile with a c89 compiler, not one more recent.
The same is true of The Open Group test suites which are available
on the Internet.

The version of the ISO C standard you are quoting from is *NOT*
the c89 version.

This makes interpretation ambiguous, since the test you are
specifically referencing to get the 0 result is text that was
added to the next version of the standard to clarify it.

> I think it makes no sense to classify
> 
> sscanf("123", "%*d%d", ...
> 
> as an error, but
> 
> sscanf("123", "%d%d", ...
> 
> not, does it? Also at least Solaris 9 return -1 but fails to set
> errno. Which is simply a bug.

It makes no sense to do conversions without assignment in the
first place (IMO).

Also, it makes no sense to call sscanf() with a string with too few
arguments, considering that you are providing the arguments to it in
the first place.  You are effectively using sscanf() to validate an
ambiguous set of data as part of its operation.

I'm not sure that this is reasonable to do.  Specifically, none of
the referenced standards expects this to happen with sscanf(), since
they do not define, specifically, how the end of the input string
should be interpreted: EOF vs. unavailability of input bytes.  One
could argue that an unavailability of matching input bytes results
only from the separator character(s) between format strings not
being matched properly.  At that point, "%d%d" (or "%*d%d") is a
non-sensical format specifier entirely, since any characters that
would be valid for input to the second specifier would also be valid
for input to the first: and the matching is, by definition, greedy.

Really, this is a problem which has occurred because you are not
using fscanf() or scanf() on the input stream, instead of doing
some conversion into an internal buffer, presumably to avoid a
buffer overflow and/or bitch about the standards being specified
inadequately in comp.lang.c, or on current_at_freebsd.org.

In other words, overly anal buffer overflow checking, rather than
specifying the buffer length in the format string.

In terms of standards conformance, I'd like to see the output of a
conformance test suite for ISO C (any version) complaining about the
-1 return.  I think IEEE 1003.1-2001 conformance is probably more
important, if we have to pick one or the other on the basis of what
sscanf() is going to return in this manufactured problem case.

I'd also like to point out that the compiler we are using permits
the standards conformance version to be chosen at compile time, but
routines like sscanf(), unless they are inlined in header files,
are not conditionally selectable based on the version at compile
time.

Further, it's quite possible that version conformance, even if it
were specifiable at compile time, is not specifiable at link time,
so moving the function into an inline would be the only viable
approach to dealing with this issue in multiple libraries, each of
which expects a different version, but which must be linked into a
single program at the end of things in order to get an applicaiton
using libraries with different expectations.

So it's pretty stupid for a language standard to specify anything
other than language syntax (e.g. things like library behaviour).

In any case, we are practically guaranteed that returning -1, as
all other UNIX-like OS's currently do, would result in less source
code breaking.

Finally, I will point to the current FreeBSD precedents in this
matter, which is the TCP/IP RFC conformance for 1644 and 1323,
which were defaulted to "off", after it broke a lot of existing
code (and Livingston Portmaster terminal servers), and select(2)
not modifying the contents of the timeval struct to provide an
accurate value for the remaining timeout prior to the select
coming true or a signal being received.

In other words, conformance level has historically been dictated
by what code is not broken, not what is technically permitted by
the standards, if you language-lawyer them to death.

To put it in IETF terms: "Be conservative in what you generate,
and generous in what you accept".

-- Terry