Hi everyone, I'm a potential Google Summer of Code applicant, proposing to work on improving the timecounter performance in the FreeBSD kernel (suggestion from Timecounter Performance Improvements<http://www.freebsd.org/projects/ideas/index.html#p-timecounter-perf>). My qualifications are mentioned at the end of this email, for those interested. After some initial discussion in #freebsd-soc, I'm posting this to the mailing lists (and CC'ing it to specific people) for further discussion before I finalize and submit my application. The primary idea is to improve the performance and resolution of gettimeofday() and friends by creating a efficient userspace implementation of these functions, along with some supporting modifications to the kernel. According to my understanding, currently the gettimeofday() function calls into the kernel to retrieve the timing information to pass on to user apps. I propose to improve it as follows: Export the relevant timing information to a shared page in memory, which will be mapped into every user app's address space. The gettimeofday() function's implementation will then be changed to read the timestamp counter (TSC) from the processor, and use the reading in conjunction with the timing info exported by the kernel to calculate and return the time info in proper format. The TSC can be read very efficiently from userspace (currently this is the fastest and highest resolution timer available, beating HPET, PIT, RTC etc.). This will allow applications to have a very fast and more importantly, a higher resolution timer available to them. This will also pave way for optionally making the FreeBSD kernel tickless, which would help with efficiency and power consumption (the processor will be able to sleep for longer durations without having to service timer interrupts several hundred times a second). Other operating systems (like OS X) already do this to varying extent. There are several issues with this approach however, and I plan to tackle each of them so that there is no loss of functionality or accuracy, and certainly no loss of performance. The project will be completed in stages, tackling each of these issues — - Implement the exporting of shared system-wide pages to be mapped into each process. (There has been some work done in this area: Avoiding syscall overhead<http://www.freebsd.org/projects/ideas/index.html#p-setproctitle>). This page will contain timing info. - Have the kernel read and export the information related to TSC during boot-up. This is heavily processor dependent and each processor (those from Intel/AMD) has its own peculiarities. The kernel should provide at least the TSC frequency by which the TSC read from userspace can be scaled to get nanosecond time. Wall time offset at boot-time should also be exported so TSC can be converted to wall time. - The TSC frequency might change on certain processors with non-constant TSC rate (because of SpeedStep, dynamic freq scaling etc.). The only way to combat this is that the kernel be notified every time the processor frequency changes. Every cpu frequency driver will need to be updated to notify the kernel before and after a cpu freq change. The tsc frequency will then need to be adjusted in the exported info. This does not apply to modern processors (Intel Core or higher and recent AMD processors, both of which have a constant tsc rate). - On multiprocessor systems, threads might bounce between different processors. There are two problems here: The TSC of each core could have an offset relative to each other, and the TSC of each core could have a drifting frequency. The first issue is found on most multicore CPUs, and will be solved by measuring the offset at boot-time and exporting this info so that the tsc read by the user app can be corrected based on the core it's running on. The second issue only applies to AMD Athlon X2 during C1 state. This is solved by following AMD's recommendation: disable c1 clock ramping during bootup and suspend/resume by updating relevant info in the northbridge configuration. - In case we have some time left before completion of GSoC, one more thing can be added. Scaling the processor frequency up and down takes a finite amount of time (tens to hundreds of microseconds). During this time, the tsc frequency is undefined. Since we will be notified both before and after such a change (by the cpufreq drivers), an alternate source (like HPET or RTC) can be used to measure this duration and correct the tsc offset after the switch. Given all this is handled carefully, we will be able to use the TSC read-out as either: (1) an offset from the last-updated timestamp (updated HZ times every second, on each timer interrupt). Or (2) use the TSC exclusively for timing and disable the timer interrupt. Currently the first approach will be used. This will avoid having to call into the kernel to get the timing info, as well as provide finer resolution timing. The second approach is an extension to allow for a tickless kernel (not part of my proposal, but do-able in the future). To summarize: The kernel exports a shared page mapped into each process and set as read-only. This page is updated on each clock tick to contain the time. This page also contains the tsc frequency and other information, which is potentially updated every time this info changes. The userspace implementation of gettimeofday() reads the timestamp counter from the processor, and the scale, offset etc. from the shared page to convert it to nanoseconds. This offset is then added to the last updated nano time (also present in the shared page) and returned to the application. The various peculiarities of each processor's tsc implementation will be accounted for. We will also need to make comprehensive benchmarks and tests to assert the validity and performance benefits. I am not well versed with rigorous benchmarking so this part of the project would need additional thought. My qualifications / personal details: I'm a 22 year old Indian male. I'm an undergrad in Electrical Engineering & Computer Science at Jacobs University Bremen, Germany. I have years of experience in C/C++ and varying job experiences ranging from web development to human-computer interaction devices. I've taken courses in computer architecture and operating systems. More details will be listed on my application, for now I'll mention the experience most relevant to the task at hand — Since August 2008, I've started and completed a port of the Darwin XNU kernel (used by OS X), for generic x86 PCs. (Webpage: http://code.google.com/p/xnu-dev) Among other things, I added lots of rtc/tsc improvements to Apple's implementation that deals with exactly the same problems I have described above. All issues were solved, and the kernel is being used in production of thousands of computers worldwide (including the computer I'm typing this on!). Most of the code was written by me, with support from a few other people, so I have a fair idea of the challenge and their solutions. The tsc multicore synchronization was written independently by two other people, so this is the part with which I'm least familiar. The code is already implemented for XNU and it works well: so most of the work would be porting it to BSD. Since I'm the author of most of it, and have good contact with the other 3-4 people who contributed other parts, there should be no licensing issues. I've also written a SpeedStep driver for OS X (http://code.google.com/p/xnu-speedstep), which sends clock recalibration signals to the kernel (also made relevant modifications in the kernel for this to work). What I still need to learn/plan My experience with FreeBSD is somewhat limited. I have a dragonflyBSD based home server (because freebsd didn't have drivers for its cheap ethernet card). My kernel programming experience is also limited to the XNU kernel (since about July last year) and I've helped fix a minor bug (typo in ethernet driver PCI ID) in dfbsd kernel. But I'm a fast learner, and given the very well commented and clear code in the freebsd kernel, I should be up to speed pretty soon. Right now I've installed freebsd in a virtual machine and am playing around with it. Will shortly try building the kernel and maybe make small modifications, figure out exactly which parts of the kernel will need modifications. I've also been reading the freebsd handbook, the "arch" book and the dev handbook. Another big problem for me would be making the modifications to export the shared page and map it into each process — my experience is mostly in handling the tsc/rtc code, but not in memory management, so this is something I need to learn. Lastly, I'm not very well-versed in making rigorous benchmarks. I've done simple benchmarking during the xnu kernel development, but these were limited to measuring clock ticks. A more comprehensive test plan would include mysql benchmarks and similar. Thanks everyone for reading through this humongous email! :-) Discussion commenceth — Best, Prashant Vaibhav PS: I am out of town with limited connectivity so responses could be somewhat slow. My aim however is to finalize and submit the application by the end of the month.Received on Thu Mar 26 2009 - 12:12:06 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:45 UTC