Re: [PATCH] open_memstream() and open_wmemstream()

From: John Baldwin <jhb_at_freebsd.org> Date: Fri, 15 Feb 2013 10:41:14 -0500 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:34 UTC

On Friday, February 15, 2013 10:10:12 am Jilles Tjoelker wrote:
> On Wed, Feb 13, 2013 at 11:44:19AM -0500, John Baldwin wrote:
> > On Thursday, February 07, 2013 4:12:22 pm Jilles Tjoelker wrote:
> > > On Tue, Feb 05, 2013 at 03:46:43PM -0500, John Baldwin wrote:
> > > > I've written an implementation of open_memstream() and
> > > > open_wmemstream() along with a set of regression tests.  I'm pretty
> > > > sure open_memstream() is correct, and I believe open_wmemstream() is
> > > > correct for expected usage.  The latter might even do the right thing
> > > > if you split a multi-byte character across multiple writes.  One
> > > > question I have is if my choice to discard any pending multi-byte
> > > > state in the stream anytime a seek changes the effective position in
> > > > the output stream.  I think this is correct as stdio will flush any
> > > > pending data before doing a seek, so if there is a partially parsed
> > > > character we aren't going to get the rest of it.
> 
> > > I don't think partially parsed characters can happen with a correct
> > > application. As per C99, an application must not call byte output
> > > functions on a wide-oriented stream, and vice versa.
> 
> > Oh, interesting.  Should I remove those tests for using byte output
> > functions from the open_wmemstream test?
> 
> Yes. Unless real applications are identified that use this undefined
> behaviour, such a test would test the implementation and not the
> specification.

Ok.

> I briefly tried to run the tests against the glibc implementation
> (Ubuntu 10.04) and got many failures. Glibc does not check the
> parameters for null pointers and simply stores them blindly which is
> POSIX-compliant. Things like bad length 6 for "foo" appear to be glibc
> bugs, and test-open_wmemstream even segfaults.

Eh, the open group page requires EINVAL:

http://pubs.opengroup.org/onlinepubs/9699919799/functions/open_memstream.html

> > > Discarding the shift state on fseek()/fseeko() is permitted (but should
> > > be documented as this is implementation-defined behaviour).
> > > State-dependent encodings (where this is relevant) are rarely used
> > > nowadays.
> 
> > > The conversion to bytes and back probably makes open_wmemstream() quite
> > > slow but I don't think that is very important.
> 
> > Note that we do this already for swprintf().  I've added this note:
> 
> > IMPLEMENTATION NOTES
> >      Internally all I/O streams are effectively byte-oriented, so using wide-
> >      oriented operations to write to a stream opened via open_wmemstream()
> >      results in wide characters being expanded to a stream of multibyte char-
> >      acters in stdio's internal buffers.  These multibyte characters are then
> >      converted back to wide characters when written into the stream.  As a
> >      result, the wide-oriented streams maintain an internal multibyte charac-
> >      ter conversion state that is cleared on any seek opertion that changes
> >      the current position.  This should have no effect unless byte-oriented
> >      output operations are used on a wide-oriented stream.
> 
> This appears to "approve" the use of byte-oriented operations on a
> wide-oriented stream which is in fact undefined.

Hmm, I can remove the last bit ("unless..."), but am struggling to think of good
wording.  Maybe:

    This should only affect use of byte-oriented output operations on a wide-
    oriented stream which is an undefined operation and should be avoided.

> > > > http://www.FreeBSD.org/~jhb/patches/open_memstream.patch
> 
> > > The seek functions should check for overflow in the addition (for
> > > SEEK_CUR and SEEK_END) and the conversion to size_t.
> 
> > Note that SEEK_CUR is effectively only called with an offset of 0
> > via __ftello().  I'm not sure of the best way to check for overflow, do I have 
> > to do something like this:
> 
> > static int
> > will_overflow(size_t off, fpos_t pos)
> > {
> > 	if (pos < 0)
> > 		return (-pos > off);
> > 	return (SIZE_MAX - off > pos);
> > }
> 
> > For SEEK_CUR I would test 'will_overflow(ms->offset, pos)' and
> > for SEEK_END 'will_overflow(ms->len, pos)'
> 
> > Presumably SEEK_SET should test for the requested position being
> > beyond SIZE_MAX as well?
> 
> > If my above test is ok then I end up with this:
> 
> Hmm, perhaps it is better to restrict to SSIZE_MAX instead of SIZE_MAX
> as this greatly simplifies the code. With the above code, comparisons
> may be signed or unsigned and there may be no greater type that fits
> both a size_t and an fpos_t. Negating a negative fpos_t is also
> incorrect because it may overflow. If you restrict the memory buffer to
> ssize_t, an ssize_t can be safely converted to an fpos_t.

Humm.  Presumably I have to check for overflow in the expression to grow
the stream as well?  I have no idea how to write these checks. :(

Of course, practically speaking (and I can make assumptions about the
implementation I think since this is libc internals after all),
fpos_t == off_t and sizeof(size_t) <= sizeof(off_t) for all platforms
we support (and likely will always be true, files can always be larger
than memory).  Does that simplify things if I assume that?

-- 
John Baldwin