Re: Benchmark (Phoronix): FreeBSD 9.0-RC2 vs. Oracle Linux 6.1 Server

From: Jeremy Chadwick <freebsd_at_jdc.parodius.com> Date: Tue, 20 Dec 2011 15:29:25 -0800 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:22 UTC

On Tue, Dec 20, 2011 at 11:54:23PM +0100, O. Hartmann wrote:
> On 12/20/11 22:45, Samuel J. Greear wrote:
> > http://www.osnews.com/story/25334/DragonFly_BSD_MP_Performance_Significantly_Improved
> > 
> > PostgreSQL tests, see the linked PDF for #'s on FreeBSD, DragonFly, Linux
> > and Solaris. Steps to reproduce these benchmarks provided.
> > 
> > Sam
> > 
> > On Tue, Dec 20, 2011 at 1:20 PM, Igor Mozolevsky <igor_at_hybrid-lab.co.uk>wrote:
> > 
> >> Interestingly, while people seem to be (arguably rightly) focused on
> >> criticising Phoronix's benchmarking, nobody has offered an alternative
> >> benchmark; and while (again, arguably rightly) it is important to
> >> benchmark real world performance, equally, nobody has offered any
> >> numbers in relation to, for example, HTTP or SMTP, or any other "real
> >> world"-application torture tests done on the aforementioned two
> >> platforms... IMO, this just goes to show that "doing is hard" and
> >> "criticising is much easier" (yes, I am aware of the irony involved in
> >> making this statement, but someone has to!)
> >>
> >>
> >> Cheers,
> >> Igor M :-)
> >> _______________________________________________
> >> freebsd-current_at_freebsd.org mailing list
> >> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> >> To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"
> >>
> 
> Thanks for those numbers.
> Impressive how Matthew Dillon's project jumps forward now. And it is
> still impressive to see that the picture is still in the right place
> when it comes to a comparison to Linux.
> Also, OpenIndiana shows an impressive performance.

Preface to my long post below:

The things being discussed here are benchmarks, as in "how much work
can you get out of Thing".  This is VERY DIFFERENT from testing
interactivity in a scheduler, which is more of a test that says "when
Thing X is executed while heavier-Thing Y is also being executed, how
much interaction is lost in Thing X".

The reason people notice this when using Xorg is because it's visual,
in an environment where responsiveness is absolutely mandatory above all
else.  Nobody is going to put up with a system where during a buildworld
they go to move a window or click a mouse button or type a key and find
that the window doesn't move, the mouse click is lost, or the key typed
has gone into the bit bucket -- or, that those things are SEVERELY
delayed, to the point where interactivity is crap.

I just want to make that clear to folks.  This immense thread has been
with regards to the latter -- bad interactivity/responsiveness on a
system which was undergoing load that SHOULD be distributed "more
evenly" across the system *while* keeping interactivity/responsiveness
high.  Historically nice/renice has been used for this task, but that
was when kernels were a little less complex and I/O subsystems were less
complex.  Remember: we've now got schedulers for each type of thing,
and who gets what priority?  You get my point I'm sure.

So remember: this was to discuss that aspect, with regards to ULE vs.
4BSD schedulers.

Now, back to the benchmarks:

This also interested me:

* Linux system crashed
  http://leaf.dragonflybsd.org/mailarchive/kernel/2011-11/msg00008.html

* OpenIndiana system crashed same way as Linux system
  http://leaf.dragonflybsd.org/mailarchive/kernel/2011-11/msg00017.html

I cannot help but wonder if the Linux and OpenIndiana installations were
more stressful on the hardware -- getting more out of the system, maybe
resulting in increased power/load, which in turn resulted in the systems
locking up (shoddy PSU, unstable mainboard, MCH problems, etc.).

My point is that Francois states these things in such a way to imply
that "DragonflyBSD was more stable", when in fact I happen to wonder the
opposite point -- that is to say, Linux and OpenIndiana were trying to
use the hardware more-so than DragonflyBSD, thus tickled what may be a
hardware-level problem.

> But this is only one suite of testing. Scientific Linux is supposed to
> give the best performance for scientifi purposes, i.e. for longhaul
> calculations, much numerical stuff. It outperforms in a typical server
> application FreeBSd, were "FreeBSD shoulkd have the power to serve".
> 
> Is the postgresql benchmark the only way to benchmark?

I sure hope not.  But you know what's equally as interesting?  This:

http://people.freebsd.org/~kris/scaling/

Specifically circa 2008:

http://people.freebsd.org/~kris/scaling/4cpu-pgsql.png
http://people.freebsd.org/~kris/scaling/pgsql-16cpu-2.png
http://people.freebsd.org/~kris/scaling/pgsql-16cpu.png

Now, I don't know if what was used in those ("pgsql sysbench") was the
same thing as "pg_bench" in the DragonflyBSD tests, but if so, the
numbers are different to a point that is preposterous.

There's also this:

http://people.freebsd.org/~kris/scaling/pgsql-ncpu.png

Now, compare those numbers to the TPS numbers shown here:

http://dl.wolfpond.org/Pg-benchmarks.pdf

So um... yeah.  Now, if someone here is going to say "well, what
was tested by Kris was FreeBSD 7.0, while what was tested by Francois
was FreeBSD 9.0, and there have been improvements", then I ask that
someone show me where the improvements are that would exhibit a 4-8x
performance increase in some cases.

This rambling of mine is the same rambling I posted earlier in this
thread.  There needs to be a consistent, standardised way of testing
this stuff.  Every system tested tuned the exact same way, software
configured the same way, absolutely no quirks applied, etc..  Otherwise
we end up with "mixed results" as shown above.

Much to the disapproval of others, the Phoronix test suite is supposed
to be that "standard".  Meaning, it's a suite you're supposed to be able
to install and thus ensures that, aside from compiler used and any
system tests, that the same code is being used regardless of what system
and OS it's on.  Have I ever used it?  No.  And it's important that I
admit that up front, because being honest is necessary.

> Well, this inspires me to gather together all the benchmarks someone
> could find. There were lots of compalins about FreeBSD's poor
> performance with BIND - once a domain of FreeBSD. Network performance
> seems also to be an issue if it comes to scalability.
> It would be nice to see what portion of the raw CPU/GPU power the OS
> (FreeBSD, Linux ...)  delivers to scientific applications.

Kris Kenneway's "BIND benchmark" that was released a long time ago
touched base on this.  Remember: these plots show nothing other than
number of queries per second correlated with number of DNS server
threads (since BIND does have a 1:1 thread-to-CPU creation ratio):

http://people.freebsd.org/~kris/scaling/bind-pt.png
http://people.freebsd.org/~kris/scaling/bind-pt-2.png
http://people.freebsd.org/~kris/scaling/bind-pt-gige.png

> I only know some kind of benchmarks, BYTE UNIX benchmark, LINPACK test
> ... Does someone know a site to look for a couple of benchmarks to test
> 
> a) memory system
> b) scalability (apart from pgbench)
> c) network performance/throughput/network scalability
> d) portion of CPU performance the system delivers for numerical
> applications to the user apart from the system's own consumption
> e) disk I/O performance and scalability
> 
> it would also be nice to discuss some nice settings and performance
> tunings for FreeBSD for several scenarios. I guess, starting developing
> benchmarking test scenarios for several purposes would lead faster to
> real numbers and non polemic than weird discussions ...

All I wish is that we had some kind of "test suite" of our own, maybe as
a port, maybe in the base system, which could really help with all of
this.  Something consistent.

Now I'm switching back to discussing interactivity/responsiveness tests:

Attilio Rao did comment in this thread to me, giving me some test
methodologies for testing interactivity during two types of simultaneous
loads -- but one involves dnetc, which I imagine means I'd need to get
familiar with that whole thing.

http://lists.freebsd.org/pipermail/freebsd-stable/2011-December/064936.html

I haven't responded to his post yet (this thread is so long and tedious
that I'm having serious problems following it + remembering all the
details -- am I the only one who feels daunted by this?  God I hope
not), but his insights are, as always, beneficial, but also
overwhelming.  Furthermore, I do not have 16-core or 24-core systems
to test on -- I have single-CPU, quad-core and dual-core systems to test
on.  I am a firm believer these are going to make up the majority of the
FreeBSD userbase (desktop and server environments).  Extreme hardware (e.g.
quad CPU with 12 cores per CPU) can be tested too, but let's at least
pick a demographic to start with.

Again: the FreeBSD users and administrative community want to help.  All
of us do.  We just need to know exactly what we should be doing to test,
and what exactly we're testing for.  I'll be blunt while choosing to
play the Idiot Admin for a moment: I'd be much happier if someone had a
tarball of shell scripts and things which could be used to test these
things.  Lots of things need to be kept in mind, such as if someone is
running the "client" test on the same box as the "server" test, and
things like "the test data is written to a local filesystem, with
echo/printf statements constantly flushed" (great, now we're causing I/O
load on top of our tests!), which to me means we should probably be
using something like mdconfig(8) to create a temporary filesystem to
store logs/data results.

The KTR stuff Atillio and many others have requested, I think, will be
the most beneficial way to get the developers the data they need.  I had
no idea about it until I found out that KTR was something completely
different than ktrace.

I still haven't found the time to do all of this, BTW, and for that I
apologise.  The reason has to do with time at work + personal desire to
do it.  When I get a daunting task, I tend to get... well, not
depressed, but "scared" of the massive undertaking since it involves
lots of recurring tests, reboots, etc. -- hours of work -- and if I get
that wrong, it's wasted effort (thus wasted developer time).  I want to
get it right.  :-)

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                   Mountain View, CA, US |
| Making life hard for others since 1977.               PGP 4BD6C0CB |