Re: CFR: AES-GCM and OpenCrypto work review

From: John-Mark Gurney <jmg_at_funkthat.com> Date: Wed, 12 Nov 2014 11:41:10 -0800 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:53 UTC

Vsevolod Stakhov wrote this message on Sat, Nov 08, 2014 at 21:20 +0000:
> On 08/11/14 20:45, John-Mark Gurney wrote:
> >Vsevolod Stakhov wrote this message on Sat, Nov 08, 2014 at 18:55 +0000:
> >>On 08/11/14 04:23, John-Mark Gurney wrote:
> >>>Hello,
> >>>
> >>>Over the last few months, I've been working on a project to add support
> >>>for AES-GCM and AES-CTR modes to our OpenCrypto framework.  The work is
> >>>sponsored by The FreeBSD Foundation and Netgate.
> >>>
> >>>I plan on committing these patches early next week.  If you need more
> >>>time for review, please email me privately and I will make delay.
> >>>
> >>>The code has already been reviewed by Watson Ladd (the software crypto
> >>>implementations) and Trevor Perrin (the aesni module part) and I have
> >>>integrated these changes into the patch.
> >>>
> >>>There are two patches, one is the changes for OpenCrypto and the test
> >>>framework.  The other is the data files used by the test framework.
> >>>The data is from NIST's CAVP program, and is about 20MB worth of test
> >>>vectors.  (I just realized, should we look at compressing these on
> >>>disk?)
> >>>
> >>>Main patch (192KB):
> >>>https://www.funkthat.com/~jmg/patches/aes.ipsec.5.patch
> >>>
> >>>Data files (~20MB):
> >>>https://www.funkthat.com/~jmg/patches/aes.ipsec.5.testing.patch
> >>>
> >>>A list of notable changes in the patch:
> >>>- Replacing crypto(4) w/ NetBSD's version + updates
> >>>- Lots of man page updates, including CIOCFINDDEV and crypto(7) which
> >>>   adds specifics about restrictions on the modes.
> >>>- Allow sane useage of both _HARDWARE and _SOFTWARE flags.
> >>>- Add a timing safe bcmp for MAC comparision.
> >>>- Add a software implementation of GCM that uses a four bit lookup
> >>>   table with parallelization.  This algorithm is possibly vulnerable to
> >>>   timing attacks, but best known mitigation methods are used.  Using
> >>>   a timing safe version is many times slower.
> >>>- Added a CRYPTDEB macro that defaults to off.
> >>>- Bring in some of OpenBSD's improvements to the OpenCrypto framework.
> >>>- If an mbuf passed to the aesni module is only one segment, don't do
> >>>   a copy.  This needs to be improved to support segmented buffers.
> >>>- Remove the CRYPTO_F_REL flag.  It was meaningless.  It was used but
> >>>   did not change any behavior.
> >>>- Add function crypto_mbuftoiov to convert an mbuf to an iov.  This
> >>>   also converts the software crypto to only use iov's even for a simple
> >>>   linear buffer, and so simplifies the processing.
> >>>- Add a dtrace probe for errors from the ioctl.
> >>>- Add the CIOCCRYPTAEAD ioctl that allows userland processing (testing)
> >>>   of AES-GCM and future AEAD modes.
> >>>
> >>>Future improvements:
> >>>- Support IV's longer than 12 bytes for GCM.
> >>>- Make AES-NI support segmented buffers (iov or mbuf) so multisegmented
> >>>   inputs don't have to be copied.
> >>
> >>I have the question regarding to the algorithm of GF field calculations
> >>used in the proposed implementation: why not use the recent researches
> >>in GCM calculations, e.g. described in [1], for further speed 
> >>optimizations?
> >>
> >>[1] - https://eprint.iacr.org/2013/157.pdf
> >
> >The paper you linked to does not describe a new way of calculating
> >GHASH, but evalutation of a bug in their implementation using the
> >PCLMULQDQ instruction...
> >
> >If you mean, why don't I use OpenSSL's code?  The reason is that their
> >code is a perl script that generates assmebly...  We don't have
> >perl in base.. and I didn't want more assembly in our tree (see below)..
> >
> >Instead, I decided to use code from Intel's whitepaper:
> >Intel® Carry-Less Multiplication Instruction and its Usage for
> >Computing the GCM Mode
> >
> >I didn't use their assembly version because I wanted to have
> >maintainable code, and also the same code can be used on both i386
> >and amd64 arches..  This turns out to also be a good thing, as when
> >I add segmented buffer support, it'll be much easier to add to the C
> >version, and I only have to do the work once for two arches...
> >
> >Also, the software GF library that I wrote is using state of the art
> >algorithms...  An OpenBSD developer has tested my code and has seen
> >a significant performance improvement over their old code, and are
> >evaluating if they want to/can include it in their tree...
> >
> >Hope this answers your question.  If not, please be more specific so
> >I can answer it.
> 
> I'm sorry, I thought that is the paper that is a transcript of the 
> following presentation:
> 
> http://2013.diac.cr.yp.to/slides/gueron.pdf
> 
> made by the same authors. The transcript is not available so far it seems.
> 
> And regarding assembler/C maintainability I would argue that the current 
> intrinsics based implementation is more readable than the pure assembler 
> solution (and it is still machine dependent). Of course, I'm not the 
> expert in such optimizations, so that is just my own feeling.
> 
> By the way, do you have some concrete numbers about the performance of 
> your aes-gcm? (I recently could do aes-128-gcm at about 32 gigabits/sec 
> that is not a limit of the modern hardware for sure).

So, in bare metal userland testing, iirc, I was able to get around
1GByte/sec on a single core...  That doesn't take into account kernel
and framework overhead...

-- 
  John-Mark Gurney				Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."