Re: More kernel performance tests on FreeBSD 10.0-CURRENT

From: O. Hartmann <ohartman_at_zedat.fu-berlin.de> Date: Sat, 22 Sep 2012 16:34:12 +0200 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:30 UTC

Am 09/22/12 15:52, schrieb Dimitry Andric:
> On 2012-09-22 14:52, O. Hartmann wrote:
> ...
>> When we used FreeBSD for scientific work, that was around 1998 - 2002,
>> there were some attempts made to use Intel's icc compiler suite on
>> FreeBSD in the 32Bit Linuxulator. That time I used that compiler only
>> for compiling my modelling software, but there where reports of people
>> made it possible to use the icc compiler also for compiling the FreeBSD
>> system - with success as far as I know. What happened since then and
>> more recent days that the sources got "polluted" by those hacks?
> 
> The Intel compiler support has been largely removed, because it was not
> maintained.  There are still remnants in cdefs.h though, and in theory
> it could be revived, if there was enough interest.
> 
> However, Intel simply does not support anything else besides Windows and
> Linux for its compiler suite, and even on the Linux side you are best
> off if you use Red Hat or a Red Hat-based distribution such as CentOS or
> Scientific Linux.
> 
> Some time ago I attempted to get a fairly recent Intel compiler version
> working on FreeBSD, but it was very tricky, and I remember I did not get
> everything working correctly.
> 
> So unless either Intel starts supporting FreeBSD (or other BSDs), which
> is very unlikely, or somebody manages to get the Linux version working
> perfectly as a port, I don't see much sense in restoring the Intel
> compiler support.

True. It is use- and senseless, from my point of view, having ancient
32bit support only via the Linuxulator (which is 32bit only). The ICC
was only useable on 32bit machines and FBSD 32bit (i386), which isn't
any kind of an option nowadays. The same discussion has been triggered
with CUDA and Linuxulator.

> 
> 
>> No offense to you, but somehow this sounds that the efford has been
>> placed in the wrong way since people revert with energy that what has
>> been hacked with energy ;-)
> 
> I think you see this incorrectly; when I removed the Intel compiler
> support from the tree, it was unmaintained for several years already.
> Apparently there was very little interest for it.

To avoid further misunderstandings - I have no objections cleaning up
the sources from unmaintained legacy. Since FreeBSD doesn't have 64Bit
Linux support, the effort is wasted energy (my opinion, even if it is
sometimes nice to see how it would perform ...).

> 
> 
> ...
>>> I have already done a few preliminary tests for -march=native, but at
>>> least for clang, there seems to be no measureable difference in
>>> performance.  The tests for gcc are still running.
>>
>> I was wondering if the organisation and amount of cache present in a
>> modern CPU is not taken into account when optimising code. Our Core2Duo
>> CPUs still in use do have different architectural features than the more
>> recent Core-i7 systems. Latter ones have level 3 caches. How does a
>> compiler take advantage of those features by not given an explicit hint?
> 
> I don't think the amount of CPU cache, or the number of levels, is taken
> into account, really.  When you select a certain CPU type with -march,
> the compiler will just enable several features that are supported on
> that CPU, e.g. MMX, SSE, AVX and so on.  It can also enable extra CPU
> registers, and/or switch to slightly different instruction scheduling.

Well, I'm not that deep into compiler development. I thought that
optimizations are also done on the level of caches a CPU has and the
size of it.
> 
> But since we are compiling the kernel with -mno-mmx, -mno-sse and even
> floating point disabled, apparently there is no real gain from
> specifying higher CPU types.

I never came deeper into this logic - since I'm no operating system
developer. But please correct me and, if possible, enlighten me, if
there is something wrong in my understanding. Assumed, the option
"-march=native" is switched on and the only "optimisation" is performed
due to selection of code portions at compile time which are enclosed,
say in
#ifdef __AVX__
__some__nasty__vector_ops_256bitwide();
#endif

which is triggered by the "#define __AVX__" on Core-i7 CPUs with __AVX__
support, why is this explicitely disabled via "-no-avx" and friends? I
would assume the developer has a reason not to use those speedy
facilities, so I wouldn't expect any portion of #ifdef __AVX__ et cetera
in the kernel code. The only explanation, from this naive point of view
is, the compiler DOES DO some optimisations regarding the presence of
such facilities and the "-no-XXX" options avoid those. Conclusively, I
would expect a kind of performance gain when those features are made
accessible.

On the other hand, why are those features disabled? Intels silica is the
reduced to something that gain speed from the clock cycle and the
internal bandwidth due to cache sizes and clock speed and, naively
spoken, all reduces to something "compatible" from ancients in the past.
I can not fathom what the benefit of a Core i7 CPU then is compared to a
Core2Duo when all the neat features are not used.

A time ago, I read something about a Linux development for malloc(),
which also utilises SSE facilities. I have no deeper clue what that
development has achieved so far, but when I read the first time about
it, they claim having 30% more performance gain over traditional SSE-less.

But this is something I do not know much about.
> 
> -Dimitry

Oliver