Re: MySQL Performance 6.0rc1

From: Poul-Henning Kamp <phk_at_phk.freebsd.dk> Date: Thu, 27 Oct 2005 16:27:46 +0200 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:46 UTC

In message <20051027140031.L32255_at_fledge.watson.org>, Robert Watson writes:

There are a several things we can do to speed up our timekeeping
code without affecting its integrity: For instance:

    *	Userland-only timestamp facility, provided the hardware is
	available from userland (TSC is, i8254 isn't, ACPI normally
	isn't and HPET will be, so it's roughly a 50% hit there).

    *	Additional CLOCK_FOO values for various degraded but fast
	timestamps.

Unfortunately, they either force intense versioning of libc or
application source-code changes, so neither is very desirable.

In addition to this there are a couple of kernel only optimizations
I have always tried to avoid:

    *	Inline assembler for timecounter math.  The 'C' language
	is notoriously bad at expressing the simple concept of a
	carry and some of the multiplications could be truncated
	intelligently, but I far prefer simple and portable C to
	complex assembly.

    *   Cache+Reuse of timestamps in the kernel.  It's very hard
	to cheaply determine when the cached timestamp is "too old"
	and it may require locking to work in the first place because
	per-CPU caches would probably not give enough hits to be
	worth it.

Before we go any further, let me remind you that our current
timecounter code does not use intra-CPU locks, provided the hardware
does not need locks.

Many if not most of the more radical ideas, TSC based two-clock
interpolation for instance, would require intra-CPU locks to prevent
against time-travel and excessive jitter.

It is also important to remember that no matter what we do, a
significant part of the overhead will still be the 'read-the-hardware-step'

For instance I just benchmarked the state-of-the-art HPET facility
on the two of my machines that have it, and found that it took 500
and 1400 nsec respectively to read them.  (HPET timecounter code will
arrive in -current RSN).

>Sadly, POSIX doesn't say anything about how applications can express 
>preferences about the cost and granularity of time measurement.

Yes, in addition to their other defficiencies [1] the APIs are
somewhat limited in what they can express.

I've often thought about inventing a new API to solve these problems,
it doesn't take much to do it right, but I have never carried through
on it because adding yet another "FreeBSD-propriety" API is not the
solution we're looking for.

Poul-Henning

[1]  a) Totally bogus leap second handling.
     b) PseudoQuasiDecimal formats (also on binary computers)
     c) Lack of traceability, and quality information.

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk_at_FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.