On Wed, Aug 15, 2007 at 11:04:29AM +0200, Erik Cederstrand wrote: > Hi! > > This autumn, we have decided to grab the Performance Tracker entry[1] > from the project ideas page and give it a spin as a subject for our > thesis at the IT University of Copenhagen. The tracker intends to fill a > hole in the range of tinderboxes and automatic stress/regression tests > that FreeBSD already has. > > The initial idea is to have a small collection of servers constantly > performing benchmarks and publishing the results to a server with a web > interface. > > Before we start coding, we'd like to ask a couple of questions: > > 1) Which benchmarks would you like to see being run? > 2) Which tests do you perform regularly, which the tracker could automate? > 3) Which features in the web interface would you find most helpful? > > Also, we'd greatly appreciate pointers to previous work in the area. > > We welcome all comments and suggestions, but please bear in mind that we > only have around 3 months full-time to develop the tracker. Hi, Thanks for your interest in the project. I have some recommendations for how to approach it: * Don't focus on the individual benchmarks, instead on the framework for accumulating and analysing the data. There are lots of benchmarks we may want to plug into this over time, so developing a flexible and extensible system for doing this is more important than any given benchmark. * I imagine a system where data from benchmark systems (which will be geographically remote) is fed into a database that tracks multiple data sets over time. A front end would provide an interface into this database and allow for various analyses and visualizations of the data * The system should allow for annotation of data, for example to provide explanations for sudden jumps in performance when they are understood. * Data sets may be multi-dimensional (e.g. tracking a performance metric like network throughput as various parameters like packet size, number of concurrent streams, etc, are changed). In most cases we are also interested in changes over time. * There may be parametric and non-parametric variables. An example of a parametric variable would be "size of a network packet" (i.e. a numerical parameter which takes values over some range). A non-parametric variable might be "kernel built with option X, or option Y, or option Z". It makes sense to visualize parametrized data as a continuous function, e.g. by plotting it as a continuous function on a graph, or fitting the data to a function. It makes less sense to treat non-parametric data as a continuous function. * Data sets are typically noisy. They need to be analysed by statistical techniques to extract a signal (if any), which will usually be tiny over small times but may accumulate over larger times. A background in statistics will be most useful here. * An ideal front-end would be able to apply appropriate statistical and data visualization techniques to cross-sections of the data to answer questions like "have there been any statistically significant changes to this data set (or subset) over time, and if so, when did they occur?". * There is likely to be significant prior art in all of this, but I don't know what any of it is. The HDF data format http://hdf.ncsa.uiuc.edu/ and related tools might be interesting to investigate; but I don't really know anything about it so it might be too heavy-weight. Perhaps some of our scientific computing users can make some suggestions. * Start small. You should keep an eye on the bigger picture such as what I suggest, but don't try and bite it all off at once. For example, you could start by limiting to recording and analysing data sets that contain only a single data point changing over time (while hopefully not limiting future expansion), because even that will be a useful beginning. Kris
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:16 UTC