Re: Improving the kernel/i386 timecounter performance (GSoC proposal)

From: Peter Jeremy <peterjeremy_at_optushome.com.au> Date: Wed, 1 Apr 2009 06:02:22 +1100 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:45 UTC

On 2009-Mar-30 18:45:30 -0700, Maxim Sobolev <sobomax_at_freebsd.org> wrote:
>You don't really need to do it on every execve() unconditionally. It 
>could be done on demand in libc, so that only when thread pass certain 
>threshold, the "common page optimization code" kicks in and does its 
>open/mmap/etc magic. Otherwise, "normal" syscall is performed.

This "optimisation" is premature.  First step is to implement an
approach that always maps (or whatever) the data and then gather some
information about its overheads in the real world.  If they are deemed
excessive, only then do we start looking at how to improve things.
And IMO, the first step would be to lazily map the page - so it's not
mapped by default but mapped the first time any of the information in
it is used.

>that for example gettimeofday() only gets optimized if threads calls it 
>more frequently that 1 call/sec.

Whilst this thread started talking about timecounters, once you have a
shared page, there is a variety of other information that could be
exported - PID being the most obvious.  If the page is exported as
code rather than data (as has been suggested) then you also have the
possibility of exporting CPU-dependent optimised versions of some
library functions (ala Solaris).  The more stuff you export, the less
you gain from supporting an export threshold.

On 2009-Mar-30 18:31:06 -0700, Maxim Sobolev <sobomax_at_FreeBSD.org> wrote:
>It's not that easy, unless you can pin thread to a specific core before 
>reading that page. I.e. imagine the case when your thread reads per-cpu 
>page, get preempted and scheduled to a different core, then executes 
>RDTSC there, still thinking it got TSC reading from the first core. Even 
>if it does re-read from that page again after reading TSC to determine 
>if he has read the correct TSC, still it's possible (though not very 
>likely) that it has been preempted again and scheduled to the first core 
>after reading the TSC.

Good point.  If you export code, rather than data, then the scheduler
can just special-case threads where the return address is inside the
magic page (this is a fairly cheap test and only needs to occur once
you have decided to re-schedule that thread - so you are already in
the "expensive" part of the scheduler and a few more instructions
won't be noticable there).  The most obvious approach would be to
temporarily pin the thread whilst it's executing inside that page.

-- 
Peter Jeremy