Re: freebsd perf testing

From: Alan Somers <asomers_at_freebsd.org> Date: Sat, 9 Nov 2013 10:37:58 -0700 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:44 UTC

On Sat, Nov 9, 2013 at 6:37 AM,  <symbolics_at_gmx.com> wrote:
> On Fri, Nov 08, 2013 at 12:28:02PM -0800, Julian Elischer wrote:
>> On 11/8/13, 1:54 AM, Olivier Cochard-Labbé wrote:
>> > On Fri, Nov 8, 2013 at 3:02 AM, Julian Elischer <julian_at_elischer.org
>> > <mailto:julian_at_elischer.org>>wrote:
>> >
>> >     Some time ago someone showed some freebsd performance graphs
>> >     graphed against time.
>> >     He had them up on a website that was updated each day or so.
>> >
>> >     I think they were network perf tests but I'm not sure.
>> >     He indicated that he was going to continue the daily testing
>> >     but I've not seen any mention of them since.
>> >
>> >     If you know who that was or how to find him let me (or gnn) know...
>> >
>> >
>> > Hi Julian,
>> >
>> > Perhaps you are referring to my network performance graphs on this
>> > thread:
>> > http://lists.freebsd.org/pipermail/freebsd-current/2013-April/041323.html
>> >
>>
>> yes, you are the person we are looking for.
>> In yesterday's 'vendor summit' we were discussing performance
>> monitoring and your methodology was cited as one worth looking at.
>>
>> The idea of graphing the output of various performance tests against
>> svn commit number is a very good one.
>> I thonk it migh teven be worth doing these tests daily, and putting
>> the output onto a web site, showing the last month, the last year and
>> the whole range.
>> it would even be interesting to put out 'xplot' files so that people
>> can zoom in and out using xplot to see exactly which revision was
>> responsinble for reversions or problems.
>>
>> George..  this is what we mentioned at the meeting yesterday.
>>
>> Julian
>>
>
> As it happens I've been thinking over a design for something along these
> lines recently. It's just some ideas at the moment but it might be of
> interest to others. Forgive me; it's a long E-mail and it gets a bit
> pie-in-the-sky too.
>
> I was prompted to think about the problem in the first place because I
> read commit mail and I see performance related changes going into the
> tree from time to time. These changes often do not come with any
> specific data and when they do its normally quite narrow in focus. For
> instance, an organisation contributes performance improvements specific
> to their workloads and without interest in anyone elses (fair enough).
>
> Initially, what I wanted was a way of viewing how performance changed
> for a number of workloads on a commit by commit basis. This sounds very
> much like what you are after.
>
> Anyway, after thinking about this for sometime it occurred to me that
> much of the infrastructure required to do performance testing could be
> generalised to all sorts of software experiments. E.g. software builds,
> regression tests, and so on. So, my first conclusion was: build an
> experimentation framework within which performance is one aspect.
>
> Having decided this, I thought about the scope of experiments I wanted
> to make. For instance, it would be good to test at least every supported
> platform. On top of that I would like to be able to vary the relevant
> configuration options too. Taking the product of commit, platform,
> n-configuration options (not to mention compilers, etc...) you start to
> get some pretty big numbers. The numbers grow far too fast and no person
> or even organisation could feasibly cover the hardware resources
> required to test every permutation. This led me to my next conclusion:
> build a distributed system that allows for anyone to contribute their
> hardware to the cause. Collectively the project, vendors, and users
> could tackle a big chunk of this.
>
> My rough sketch for how this would work is as follows. A bootable USB
> image would be made for all platforms. This would boot up, connect to
> the network and checkout a repository. The first phase of the process
> would be to profile what the host can offer. For example, we might have
> experiments that require four identical hard drives, or a particular CPU
> type, and so on. Shell scripts or short programmes would be written,
> e.g. "has-atom-cpu", with these returning either 1 or 0.
>
> The results of this profiling would be submitted to a service. The
> service matches the host with available experiments based on its
> particular capabilities and current experimental priorities laid down by
> the developers. A priority system would allow for the system to be
> controlled precisely. If, for instance, major work is done to the VM
> subsystem, relevant experiments could be prioritised over others for a
> period.
>
> Once a decision on the experiment to conduct has been made, the relevant
> image must be deployed to the system. Free space on the USB device would
> be used a staging area, with a scripted installation occurring after
> reboot. The images would need to be built somewhere, since it doesn't
> make sense to rebuild the system endlessly, especially if we're
> including low-powered embedded devices (which we should be). One
> possible solution to this would be to use more powerful contributed
> hosts to cross-build images and make them available for download.
>
> Finally, the experiments would be conducted. Data produced would be
> submitted back to the project using another service where it could be
> processed and analysed. To keep things flexible this would just consist
> of a bunch of flat files, rather than trying to find some standardised,
> one-size-fits all format. Statistics and graphics could be performed on
> the data with R/Julia/etc. In particular I imagined DTrace scripts being
> attached to experiments so that specific data can be collected. If
> something warranting further investigation is found the experiment could
> be amended with additional scripts, allowing developers to drill down
> into issues.
>
> After some time the process repeats with new image deployed and new
> experiments conducted. I envisage some means of identifying individual
> hosts so that a developer could repeat the same experiment on the same
> host if desired.
>
> Among the many potential problems with this plan, a big one is how would
> we protect contributors privacy and security whilst still having a
> realistic test environment? I guess the only way to do this would be to
> (1) tell users that they should treat the system as if its hacked and
> put it in its own network, (2) prevent the experiment images from
> accessing anything besides FreeBSD.org.
>
> In relation to network performance, this might not be much good, since
> multiple hosts might be necessary. It might be possible to build that
> into the design too but it's already more than complicated enough.
>
> Anyhow, I think such a facility could be an asset if could be built. I
> may try and put this together, but I've committed myself to enough
> things already recently to take this any further at the moment. I'd be
> interested to hear what people think, naturally.
>

This sounds exactly like the Phoronix test suite and its web-based
reporting platform, openbenchmarking.org.  It already has a large
number of benchmarks to choose from, and it runs on FreeBSD.  The
downsides are that it can't do anything involving multiple hosts and
it doesn't have a good interface to query results vs machine
parameters, eg how does the score of benchmark X vary with the amount
of RAM?  But it's open-source, and I'm sure that patches are welcome
;)

-Alan