On Wednesday, January 11, 2012 11:29:44 am Luigi Rizzo wrote: > On Wed, Jan 11, 2012 at 10:05:28AM -0500, John Baldwin wrote: > > On Tuesday, January 10, 2012 5:41:00 pm Luigi Rizzo wrote: > > > On Tue, Jan 10, 2012 at 01:52:49PM -0800, Adrian Chadd wrote: > > > > On 10 January 2012 13:37, Luigi Rizzo <rizzo_at_iet.unipi.it> wrote: > > > > > I was glancing through manpages and implementations of bus_dma(9) > > > > > and i am a bit unclear on what this API (in particular, bus_dmamap_sync() ) > > > > > does in terms of memory barriers. > > > > > > > > > > I see that the x86/amd64 and ia64 code only does the bounce buffers. > > > > That is because x86 in general does not need memory barriers. ... > > maybe they are not called memory barriers but for instance > how do i make sure, even on the x86, that a write to the NIC ring > is properly flushed before the write to the 'start' register occurs ? > > Take for instance the following segment from > > head/sys/ixgbe/ixgbe.c::ixgbe_xmit() : > > txd->read.cmd_type_len |= > htole32(IXGBE_TXD_CMD_EOP | IXGBE_TXD_CMD_RS); > txr->tx_avail -= nsegs; > txr->next_avail_desc = i; > > txbuf->m_head = m_head; > /* Swap the dma map between the first and last descriptor */ > txr->tx_buffers[first].map = txbuf->map; > txbuf->map = map; > bus_dmamap_sync(txr->txtag, map, BUS_DMASYNC_PREWRITE); > > /* Set the index of the descriptor that will be marked done */ > txbuf = &txr->tx_buffers[first]; > txbuf->eop_index = last; > > bus_dmamap_sync(txr->txdma.dma_tag, txr->txdma.dma_map, > BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE); > /* > * Advance the Transmit Descriptor Tail (Tdt), this tells the > * hardware that this frame is available to transmit. > */ > ++txr->total_packets; > IXGBE_WRITE_REG(&adapter->hw, IXGBE_TDT(txr->me), i); > > the descriptor is allocated without any caching constraint, > the bus_dmamap_sync() are effectively NOPs on i386 and amd64, > and IXGBE_WRITE_REG has no implicit guard. x86 doesn't need a guard as its stores are ordered. The bus_dmamap_sync() would be sufficient for platforms where stores can be reordered in this case (as those platforms should place memory barriers in their implementation of bus_dmamap_sync()). > > We could use lfence/sfence on amd64, but on i386 not all processors support > > ok then we can make it machine-specific versions... this is kernel > code so we do have a list of supported CPUs. It is not worth it to add the overhead for i386 to do that when all modern x86 CPUs are going to run amd64 anyway. > > those. The broken drivers doing it by hand don't work on early i386 CPUs. > > Also, I personally don't like using membars like rmb() and wmb() by hand. > > If you are operating on normal memory I think atomic_load_acq() and > > atomic_store_rel() are better. > > is it just a matter of names ? For regular memory when you are using memory barriers you often want to tie the barrier to a specific operation (e.g. it is the store in IXGBE_WRITE_REG() above that you want ordered after any other stores). Having the load/store and membar in the same call explicitly notes that relationship. > My complaint was mostly on how many > unused parameters you need to pass to bus_space_barrier(). > They make life hard for both the programmer and the > compiler, which might become unable to optimize them out. Yes, it seems overly abstracted. In NetBSD, bus_dmapmap_sync() actually takes extra parameters to say which portion of the map should be sync'd. We removed those in FreeBSD to make the API simpler. bus_space_barrier() could probably use similar simplification (I believe we also adopted that API from NetBSD). -- John BaldwinReceived on Wed Jan 11 2012 - 15:49:46 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:23 UTC