About extensible prinf(3), a slightly long X-mas card

From: Poul-Henning Kamp <phk_at_phk.freebsd.dk>
Date: Fri, 16 Dec 2005 22:01:03 +0100
Those of you following CVS commits will have have noticed that I just
added an extensible printf(3) implemention as an experiment.

This email is intended to provide some insight and background.

First, what is it ?
-------------------

In a program I'm writing right now, I have some measurement points
which are identified by a triplet of numbers, usually encoded in
a single integer.

That means I have to write things like
	printf("%03u.%03u.%03u",
	    SITE_OF(p), GROUP_OF(p), POINT_OF(p));
where each of the three macros basically does "(X >> a) & b"
for varying a and b.

I could have made a function:

	char *
	fmt_U(unsigned p)
	{
		static char buf[12];

		sprintf(buf, "%03u.%03u.%03u",
		    SITE_OF(p), GROUP_OF(p), POINT_OF(p));
		return (buf);
	}

and that would mostly work, only not where it is most needed
because of the static buffer:

	fprintf(stderr,
	    "WARNING: configuring %03u.%03u.%03u as analog-input"
	    " conflicts with pin usage on %03u.%03u.%03u thru "
	    "%03u.%03u.%03u\n",
	        SITE_OF(p), GROUP_OF(p), POINT_OF(p),
	        SITE_OF(a), GROUP_OF(a), POINT_OF(a),
	        SITE_OF(b), GROUP_OF(b), POINT_OF(b));

Obviously, it would be much smarter if I could tell printf that
%U in my program means format an integer this particular way and
write the much more readable:

	fprintf(stderr,
	    "WARNING: configuring %U as analog-input"
	    " conflicts with pin usage on %U thru %U\n", a, b, c);


What ABI to use ?
-----------------

GLIBC got here before us, so, lets examine their ABI:

The GLIBC ABI consists of a function
	int register_printf_function(
	    int spec,
	    printf_function *render,
	    printf_arginfo_function *arginfo);

The arginfo function is called first, to determine the kinds
of arguments, and the render function is called to do the
actual formatting.

This two-step approach is necessary because at some point some
madhatter in some standardsbody or other added a positional argument
facility to printf() without thinking about the consequences:

Did you know that
	printf("%2$s %1$s\n", "world\n", "Hello");
is legal ?

There are semi-quasi-good reasons for allowing something like this,
but the cost in performance is high: One is forced to scan the
format string at least twice to first determine the types for the
arguments and then again in order to actually render them to the
output.

In practice one is forced to scan the format string _three_ times,
once to find the number of arguments so that memory can be allocated
for the second pass which finds the type so that this info can be
made available for the third pass.  Sigh...

Well, anyway, back to my example:

The code to implement my points would in GLIBC's world look
like this:

First a function which says that we take one argument of
type int:

	int
	printf_arginfo_point(const struct printf_info *pi,
	    size_t n, int *argt)
	{

		assert(n >= 1);
		argt[0] = PA_INT;
		return (1);
	}

and then a function to render that:

	int
	printf_render_point(FILE *fp,
	    const struct printf_info *pi, const void *const *arg)
	{
		unsigned u;

		u = *((unsigned **)arg[0]);
		return (fprintf(fp, "%03u.%03u.%03u",
		    SITE_OF(u), GROUP_OF(u), POINT_OF(u)));
	}

And finally a call to register these two;

	i = register_printf_function('U',
	    printf_render_point, printf_arginfo_point);

I can live with that, so I have adopted that ABI.


Implementation & performance
----------------------------

The next thing was to teach our printf(3) implementation about this,
and boy, what a monster.

Take a peek for yourself in:

	src/lib/libc/stdio/vfprintf.c

This is why we in FreeBSD have such a damn good performance: every
trick in the book have been used where it matters.

Attempting to shoe-horn extensibility into this piece of code would
be suicide, both from a performance point of view and for my sanity.

Instead what I did was write a new implementation of the core task: parsing
the format string, and then extracted the integer, floating-point,
string and char stuff from the vfprintf.c code and put it into separate
files and aligned those with the GLIBC ABI.

Until somebody specifically ask for it (with a global variable or an
environment variable) or registers an extension, we keep running on
our good old fast spaghetti vprintf, but once extensibility is called
for, we switch to my code.

Unfortunately, the extensible printf had a totally unsatisfactory
perforamnce in comparison with the vprintf.c one and something
had to happen.

The problem is that printing for instance a floating point number
consists of marshalling up to 6 pieces of the string:

         * [+|-| ] [0x|0X] MMM . NNN [e|E|p|P] [+|-] ZZ
         *    A       B     ---C---      D       E   F
	(taken from an exceptionally good comment in vfprintf.c)

and in addition there may be leading and trailing padding to the
requested field width.

That means that worst (sensible) case, 8 separate I/O requests will
be made for one floating point number.  Ouch.

The vprintf.c implementation has its own private struct uio
implementation for this exact reason.  I reimplmented that so
that it was compatible with the extensibility of the new printf,
but that unfortunately breaks the GLIBC ABI because now the render
function must take a "struct __printf_io *" instead of a "FILE *"

So right now we have two ABIs, the GLIBC and the FreeBSD.  I'm not
happy with that, and I may drop the FreeBSD one again and decide
to live with the performance hit.

Another trick which vfprintf.c does is that if the file is unbuffered
(setbuf(fp, NULL)) it creates a local buffer so that each of the
I/Os won't result in a write(2) system call, and only at the end
of the printf(3) operation, does the buffer get written to the
file.  Sneaky.  I've added that as well and that helped performance
a good deal too.


Default extensions
------------------

I also decided that there were some pieces of code that I have
written far too many times to be funny, and therefore I added
three sets of arginfo/render functions for things I have missed
more times than I could count:

%H -- Hexdump

    Takes a pointer and a length:

	printf("Received %H\n", packet, packet_len);
		00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f
		10 11 12 13

    If you add a width, that will control how many columns are
    printed:

	printf("Received %4H\n", packet, packet_len);
		00 01 02 03
		04 05 06 07
		08 09 0a 0b
		0c 0d 0e 0f
		10 11 12 13

    Adding the printf flag for alternate will add ascii dump and
    a "+" will give addresses:

	printf("Received %+#8H\n", packet, packet_len);
		0000 00 01 02 03 04 05 06 07 |........|
		0008 08 09 0a 0b 0c 0d 0e 0f |........|
		0010 10 11 12 13             |....    |

%V -- Stringvis

    Expands non-printable characters to C-style back-slash escapes,
    octal back-slash escapes or HTTP stype %xx escapes:

	printf("%V", recv_string)
		Hello\tWor\377ld
	printf("%0V", recv_string)
		Hello\011Wor\377ld
	printf("%+V", recv_string)
		Hello%09Wor%FFld

%T -- time_t/timeval/timespec

    Making sense of time intervals takes a lot of printf work, and
    I do this a lot:

	printf("%T\n", &delta_t);	/* time_t */
		20349
	printf("%#T\n", &delta_t);	/* time_t */
		5h39m9
	printf("%#lT\n", &tv);		/* struct timeval */
		5h39m9.245000
	printf("%#.3llT\n", &ts);	/* struct timespec */
		5h39m9.245

You can enable these with

	register_printf_render_std("HVT");


Style(9) and GCC issues
-----------------------

Obviously, such extensions are not condoned by style(9) but probably more
damning, GCC's printf format checker knows nothing about them, so it will
take a fit if you use them.

For these reasons, and for general reasons of sanity, printf format
extensions should _not_ be added to FreeBSD programs at this time, if ever.


Conclusion & future
--------------------

Well, there you have it:  Some of you will, like me, have implemented your
own printf many times to do stuff like this, now FreeBSD offers a sane way
to do that.

If this proves useful, it'll stay in the tree and be part of 7.0 (no MFCs!)
it people break out in bikesheds about it, it will not.

Keep me posted...


-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk_at_FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.
Received on Fri Dec 16 2005 - 20:01:16 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:49 UTC