Re: [PATCH] open_memstream() and open_wmemstream()

From: John Baldwin <jhb_at_freebsd.org>
Date: Wed, 13 Feb 2013 11:44:19 -0500
On Thursday, February 07, 2013 4:12:22 pm Jilles Tjoelker wrote:
> On Tue, Feb 05, 2013 at 03:46:43PM -0500, John Baldwin wrote:
> > I've written an implementation of open_memstream() and
> > open_wmemstream() along with a set of regression tests.  I'm pretty
> > sure open_memstream() is correct, and I believe open_wmemstream() is
> > correct for expected usage.  The latter might even do the right thing
> > if you split a multi-byte character across multiple writes.  One
> > question I have is if my choice to discard any pending multi-byte
> > state in the stream anytime a seek changes the effective position in
> > the output stream.  I think this is correct as stdio will flush any
> > pending data before doing a seek, so if there is a partially parsed
> > character we aren't going to get the rest of it.
> 
> I don't think partially parsed characters can happen with a correct
> application. As per C99, an application must not call byte output
> functions on a wide-oriented stream, and vice versa.

Oh, interesting.  Should I remove those tests for using byte output
functions from the open_wmemstream test?

> Discarding the shift state on fseek()/fseeko() is permitted (but should
> be documented as this is implementation-defined behaviour).
> State-dependent encodings (where this is relevant) are rarely used
> nowadays.
> 
> The conversion to bytes and back probably makes open_wmemstream() quite
> slow but I don't think that is very important.

Note that we do this already for swprintf().  I've added this note:

IMPLEMENTATION NOTES
     Internally all I/O streams are effectively byte-oriented, so using wide-
     oriented operations to write to a stream opened via open_wmemstream()
     results in wide characters being expanded to a stream of multibyte char-
     acters in stdio's internal buffers.  These multibyte characters are then
     converted back to wide characters when written into the stream.  As a
     result, the wide-oriented streams maintain an internal multibyte charac-
     ter conversion state that is cleared on any seek opertion that changes
     the current position.  This should have no effect unless byte-oriented
     output operations are used on a wide-oriented stream.
 
> > http://www.FreeBSD.org/~jhb/patches/open_memstream.patch
> 
> The seek functions should check for overflow in the addition (for
> SEEK_CUR and SEEK_END) and the conversion to size_t.

Note that SEEK_CUR is effectively only called with an offset of 0
via __ftello().  I'm not sure of the best way to check for overflow, do I have 
to do something like this:

static int
will_overflow(size_t off, fpos_t pos)
{
	if (pos < 0)
		return (-pos > off);
	return (SIZE_MAX - off > pos);
}

For SEEK_CUR I would test 'will_overflow(ms->offset, pos)' and
for SEEK_END 'will_overflow(ms->len, pos)'

Presumably SEEK_SET should test for the requested position being
beyond SIZE_MAX as well?

If my above test is ok then I end up with this:

static int
will_overflow(size_t offset, fpos_t pos)
{
	if (pos < 0) {
		if (-pos > off) {
			errno = EINVAL;
			return (1);
		}
	} else {
		if (SIZE_MAX - off > pos) {
			errno = EOVERFLOW;
			return (1);
		}
	}
	return (0);
}

static fpos_t
memstream_seek(void *cookie, fpos_t pos, int whence)
{
	struct memstream *ms;
#ifdef DEBUG
	size_t old;
#endif

	ms = cookie;
#ifdef DEBUG
	old = ms->offset;
#endif
	switch (whence) {
	case SEEK_SET:
		if (pos > SIZE_MAX) {
			errno = EOVERFLOW;
			return (-1);
		}
		ms->offset = pos;
		break;
	case SEEK_CUR:
		if (will_overflow(ms->offset, pos))
			return (-1);
		ms->offset += pos;
		break;
	case SEEK_END:
		if (will_overflow(ms->len, pos))
			return (-1);
		ms->offset = ms->len + pos;
		break;
	}
	memstream_update(ms);
#ifdef DEBUG
	fprintf(stderr, "MS: seek(%p, %zd, %d) %zd -> %zd\n", ms, pos, whence,
	    old, ms->offset);
#endif
	return (ms->offset);
}

Thanks for reviewing this so far.

-- 
John Baldwin
Received on Wed Feb 13 2013 - 19:04:22 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:34 UTC