Re: Adding support for WC (write-combining) memory to bus_dma

From: John Baldwin <jhb_at_freebsd.org> Date: Thu, 12 Jul 2012 11:36:08 -0400 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:28 UTC

On Thursday, July 12, 2012 11:02:07 am Ian Lepore wrote:
> On Thu, 2012-07-12 at 10:40 -0400, John Baldwin wrote:
> > I have a need to allocate static DMA memory via bus_dmamem_alloc() that is 
> > also WC (for a PCI-e device so it can use "nosnoop" transactions).  This is 
> > similar to what the nvidia driver needs, but in my case it is much cleaner to 
> > allocate the memory via bus dma since the existing code I am extending all 
> > uses busdma.
> > 
> > I have a patch to implement this on 8.x for amd64 that I can port to HEAD if 
> > folks don't object.  What I would really like to do is add a new paramter to 
> > bus_dmamem_alloc() to specify the memory attribute to use, but I am hesitant 
> > to break that API.  Instead, I added a new flag similar to the existing 
> > BUS_DMA_NOCACHE used to allocate UC memory.
> > 
> > While doing this, I ran into an old bug, which is that if you were to call 
> > bus_dmamem_alloc() with BUS_DMA_NOCACHE but a tag that otherwise fell through 
> > to using malloc() instead of contigmalloc(), bus_dmamem_alloc() would actually
> > change the state of the entire page.  This seems wrong.  Instead, I think that 
> > any request for a non-default memory attribute should always use 
> > contigmalloc().  
> 
> The problem I have with this (already, even before your proposed
> changes) is that contigmalloc() is only able to allocate pages.  In the
> ARM world we have a need to allocate BUS_DMA_COHERENT memory (same
> effect as BUS_DMA_NOCACHE; we should consolidate these names) that is
> aligned to a 32-byte boundary (cacheline-aligned) but usually the buffer
> is far smaller than a page, often smaller than 1k, and sometimes we need
> lots of them (allocating 128 pages for ethernet buffers, with only half
> of each page used, is unreasonably expensive on a platform with only
> 64mb to begin with).
> 
> I keep thinking what's needed is a busdma allocation helper routine,
> something MI that can be used by the various MD busdma implementations,
> that can manage a pool of pages that are flagged as uncachable and can
> subdivide those pages to provide small blocks of memory that fit various
> alignment and boundary restrictions.
> 
> To be clear, I'm not objecting to your proposed changes, I'm more just
> musing that similar problems exist in non-x86 architectures and maybe an
> MI solution is possible (or at least the groundwork could be laid)?

The traditional argument I've heard against this is that the relevant driver
should allocate a big block and manage suballocations on its own rather than
pushing that work into bus_dma.  How are you allocating Ethernet buffers
btw?  Are you not using mbuf clusters to receive packets, but allocate
mbufs in your RX interrupt handler and copying data out of static buffers
into the mbufs to send up the stack?

Also, I do not think BUS_DMA_COHERENT and BUS_DMA_NOCACHE are quite the same.
I see UC as a way to implement COHERENT semantics, but it also seems to me that
a COHERENT mapping can't use bounce pages either.  OTOH, NOCACHE (and my new
flag), are specifically requesting a certain mapping behavior not necessarily
to avoid bus_dmamap_sync() operations, but due to a hardware requirement (e.g.
the WC mapping is to enable use of "nosnoop" PCI-e transactions).

That is, I interpret COHERENT as meaning "this map doesn't require
bus_dmamap_sync(), do whatever it takes to make that true", where as NOCACHE
and WC have other meanings (though in practice NOCACHE and WC both imply
COHERENT).  For example, on x86 with caches that snoop DMA transactions,
COHERENT doesn't require NOCACHE at all, it simply requires avoiding the use
of bounce pages.

-- 
John Baldwin