Re: CFR: AES-GCM and OpenCrypto work review

From: Vsevolod Stakhov <vsevolod_at_highsecure.ru>
Date: Sat, 08 Nov 2014 21:20:04 +0000
On 08/11/14 20:45, John-Mark Gurney wrote:
> Vsevolod Stakhov wrote this message on Sat, Nov 08, 2014 at 18:55 +0000:
>> On 08/11/14 04:23, John-Mark Gurney wrote:
>>> Hello,
>>>
>>> Over the last few months, I've been working on a project to add support
>>> for AES-GCM and AES-CTR modes to our OpenCrypto framework.  The work is
>>> sponsored by The FreeBSD Foundation and Netgate.
>>>
>>> I plan on committing these patches early next week.  If you need more
>>> time for review, please email me privately and I will make delay.
>>>
>>> The code has already been reviewed by Watson Ladd (the software crypto
>>> implementations) and Trevor Perrin (the aesni module part) and I have
>>> integrated these changes into the patch.
>>>
>>> There are two patches, one is the changes for OpenCrypto and the test
>>> framework.  The other is the data files used by the test framework.
>>> The data is from NIST's CAVP program, and is about 20MB worth of test
>>> vectors.  (I just realized, should we look at compressing these on
>>> disk?)
>>>
>>> Main patch (192KB):
>>> https://www.funkthat.com/~jmg/patches/aes.ipsec.5.patch
>>>
>>> Data files (~20MB):
>>> https://www.funkthat.com/~jmg/patches/aes.ipsec.5.testing.patch
>>>
>>> A list of notable changes in the patch:
>>> - Replacing crypto(4) w/ NetBSD's version + updates
>>> - Lots of man page updates, including CIOCFINDDEV and crypto(7) which
>>>    adds specifics about restrictions on the modes.
>>> - Allow sane useage of both _HARDWARE and _SOFTWARE flags.
>>> - Add a timing safe bcmp for MAC comparision.
>>> - Add a software implementation of GCM that uses a four bit lookup
>>>    table with parallelization.  This algorithm is possibly vulnerable to
>>>    timing attacks, but best known mitigation methods are used.  Using
>>>    a timing safe version is many times slower.
>>> - Added a CRYPTDEB macro that defaults to off.
>>> - Bring in some of OpenBSD's improvements to the OpenCrypto framework.
>>> - If an mbuf passed to the aesni module is only one segment, don't do
>>>    a copy.  This needs to be improved to support segmented buffers.
>>> - Remove the CRYPTO_F_REL flag.  It was meaningless.  It was used but
>>>    did not change any behavior.
>>> - Add function crypto_mbuftoiov to convert an mbuf to an iov.  This
>>>    also converts the software crypto to only use iov's even for a simple
>>>    linear buffer, and so simplifies the processing.
>>> - Add a dtrace probe for errors from the ioctl.
>>> - Add the CIOCCRYPTAEAD ioctl that allows userland processing (testing)
>>>    of AES-GCM and future AEAD modes.
>>>
>>> Future improvements:
>>> - Support IV's longer than 12 bytes for GCM.
>>> - Make AES-NI support segmented buffers (iov or mbuf) so multisegmented
>>>    inputs don't have to be copied.
>>
>> I have the question regarding to the algorithm of GF field calculations
>> used in the proposed implementation: why not use the recent researches
>> in GCM calculations, e.g. described in [1], for further speed optimizations?
>>
>> [1] - https://eprint.iacr.org/2013/157.pdf
>
> The paper you linked to does not describe a new way of calculating
> GHASH, but evalutation of a bug in their implementation using the
> PCLMULQDQ instruction...
>
> If you mean, why don't I use OpenSSL's code?  The reason is that their
> code is a perl script that generates assmebly...  We don't have
> perl in base.. and I didn't want more assembly in our tree (see below)..
>
> Instead, I decided to use code from Intel's whitepaper:
> Intel® Carry-Less Multiplication Instruction and its Usage for
> Computing the GCM Mode
>
> I didn't use their assembly version because I wanted to have
> maintainable code, and also the same code can be used on both i386
> and amd64 arches..  This turns out to also be a good thing, as when
> I add segmented buffer support, it'll be much easier to add to the C
> version, and I only have to do the work once for two arches...
>
> Also, the software GF library that I wrote is using state of the art
> algorithms...  An OpenBSD developer has tested my code and has seen
> a significant performance improvement over their old code, and are
> evaluating if they want to/can include it in their tree...
>
> Hope this answers your question.  If not, please be more specific so
> I can answer it.
>
> Thanks.
>

I'm sorry, I thought that is the paper that is a transcript of the 
following presentation:

http://2013.diac.cr.yp.to/slides/gueron.pdf

made by the same authors. The transcript is not available so far it seems.

And regarding assembler/C maintainability I would argue that the current 
intrinsics based implementation is more readable than the pure assembler 
solution (and it is still machine dependent). Of course, I'm not the 
expert in such optimizations, so that is just my own feeling.

By the way, do you have some concrete numbers about the performance of 
your aes-gcm? (I recently could do aes-128-gcm at about 32 gigabits/sec 
that is not a limit of the modern hardware for sure).
Received on Sat Nov 08 2014 - 20:20:11 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:53 UTC