Re: Timers and timing, was: MySQL Performance 6.0rc1

From: Maxim Sobolev <sobomax_at_portaone.com> Date: Sat, 29 Oct 2005 03:04:33 -0700 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:46 UTC

Peter Jeremy wrote:
> On Sat, 2005-Oct-29 00:29:10 -0700, Maxim Sobolev wrote:
> 
>>Poul-Henning Kamp wrote:
>>
>>>In message <4362BA38.1090603_at_portaone.com>, Maxim Sobolev writes:
>>>
>>>>You can solve most of those issues by exporting from kernel to userland 
>>>>not only page(s) with actual data, but also page(s) with code to handle 
>>>>that data. Then you can turn syscalls implementation in libc into plain 
>>>>function calls to addresses in that code page(s). This approach can 
>>>>potentially have other interesting applications, for example it will be 
>>>>possible to use processor-specific syscalls instructions without 
>>>>recompiling userland, move some of the ABI code into userland (i.e. 
>>>>freebsd32 layer on amd64) etc.
> 
> 
> The data I understand - we document a struct that defines the page
> contents, I'm less sure about the code.  The concept is appealing
> but some more detail would be nice.
> 
> 
>>>I'm not sure I see much difference between a shared library and this
>>>solution, but I'm equally sure we'd love to se a prototype before
>>>we judge it :-)
>>
>>Difference is that you won't have additional problems with userland and 
>>kernel versions mismatch and don't need any additional complexity 
>>associated with versioning/fallback logic.
> 
> 
> I'm not sure I understand how you'll achieve this.  How would a userland
> application locate the appropriate entry points?  If the exported code
> looks like a automagically-mapped shared library, we'd need to embed the
> ELF symbol table in the kernel as well.  How does an application compiled
> for (eg) FreeBSD-6 handle the code page exported by a FreeBSD-7 kernel?

Well, since this exported code is just another indirection layer for 
syscalls, we don't really need any real ELF symbols or something like 
that - libc can take syscall number and either do a call to some fixed 
address + f(syscall_number) or get address from fixed address + 
f(syscall_number) and call it. All the rest is beyond application 
control/interest. Yes, from application's point of view it's kinda 
magical, but not much more magical that what happens now when 
application calls kernel code via INT syscall gate. In some sense this 
is about splitting every syscalls into two portions - one which runs in 
userland (for most of systcalls it will be just plain old INT gate) and 
another one which runs in the kernel space. The main win is that some 
syscalls will be able to complete in userland wihtout entering kernel at 
all.

Since the location is fixed in the process address space (or at least 
there is some easy way to learn it) and format is also fixed there 
should be no additional problems with running app designed for one 
release on another release.

The main point of all this is not to solve existing ABI compatibility 
problems, but to allow kernel to export some data directly to userland 
without introducing new compatibility problems and potentially have a 
possibility to use more efficient syscall mechanisms on architectures 
that support it witout recompiling libc and every statically linked 
application in the system.

-Maxim