Re: RFC: ktls and krpc using M_EXTPG mbufs

From: Andrew Gallatin <gallatin_at_cs.duke.edu>
Date: Mon, 27 Jul 2020 13:21:27 -0400
On 2020-07-19 19:34, Rick Macklem wrote:
> I spent a little time chasing a problem in the nfs-over-tls code, where it
> would sometimes end up with corrupted data in the file(s) of a mirrored
> pNFS configuration.
> 
> I think the problem was that the code filled the data to be written into
> anonymous page M_EXTPG mbufs, then did a m_copym() { copy by
> reference } and used the copies for the mirrored writes.
> --> In ktls_encrypt(), the encryption was done to the same pages and,
>         sometimes, the encrypted data got encrypted again during the
>         sosend() of the other copy.
> 
> Although I haven't reproduced it, a regular kernel write RPC could suffer the
> same consequences if the RPC is retried (it keeps an m_copym() copy
> of the request in the krpc for an RPC retry).
> 
> At this time, the code in projects/nfs-over-tls works correctly, since it
> always fills the data to be written into mbuf clusters, m_copym()s those
> and then copies those { real copying using memcpy() } via
> mb_mapped_to_unmapped() just before calling sosend().
> --> This works, but it would be nice to avoid the mb_mapped_to_unmapped()
>        copying for all the data being written via an NFS over TLS connection.
> 
> For the TCP_TLS_MODE_SW case:
> --> The NFS code can fill the written data into anonymous pages on M_EXTPG
>         mbufs.
> Then, the ktls_encrypt() could be modified to
> allocate a new set of anonymous pages for the destination side of
> the encryption (it already does this for the sendfile case) and put those
> in a new mbuf list.
> --> This would result in new anonymous pages and mbufs being allocated,
>         but would not do memcpy()s.
> After encryption, it would just do a m_freem() on the unencrypted list.
> --> For the krpc client case, this call would only decrement the reference
>        count on the unencrypted list and it could be used for a retry by the krpc
>        and then be free'd { m_freem() call } after a reply is received.
> 
> If doing this for all the sosend()s of anonymous page M_EXTPG mbufs seems
> like unnecessary overhead, the above could be enabled via a setsockopt()
> on the socket.
> 
> What do others think of this?

Several comments:

mb_mapped_to_unmapped() is surprisingly inexpensive.  It was less than 
5% before I converted iflib to M_NOMAP aware.


It seems like NFS should be constructing mbufs like sendfile does, and
pointing mbufs at its pages.  This would cause the crypto code to
allocate a new set of pages upon encryption.



> For the hardware offload case:
> - Can I assume that the anonymous pages in M_EXTPG mbufs will remain
>     unchanged?
> --> If so, and it won't change to TCP_TLS_MODE_SW, the NFS code could
>         fill the data to be written into M_EXTPG mbufs safely.
> 
> - And, if so, can I safely use the ktls_session mode field to decide if offload
>    is happening?
>    I see the TCP_TXTLS_MODE socket opt which seems to
>    switch the mode to TCP_TLS_MODE_SW.
>    When does this happen? Or, can this happen to a session once in use?

Yes.  The intent is to allow something (TCP stack, smart user daemon) to
look at a connection & move it from hardware to software, if it has a
lot of TCP re-transmits.

Drew
Received on Mon Jul 27 2020 - 15:21:31 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:24 UTC