Re: 4.7 vs 5.2.1 SMP/UP bridging performance

From: John Baldwin <jhb_at_FreeBSD.org>
Date: Thu, 6 May 2004 14:11:27 -0400
On Thursday 06 May 2004 06:18 am, Bruce Evans wrote:
> On Wed, 5 May 2004, Gerrit Nagelhout wrote:
> > Andrew Gallatin wrote:
> > > If its really safe to remove the xchg* from non-SMP atomic_store_rel*,
> > > then I think you should do it.  Of course, that still leaves mutexes
> > > as very expensive on SMP (253 cycles on the 2.53GHz from above).
>
> See my other reply [1 memory barrier but not 2 seems to be needed for
> each lock/unlock pair in the !SMP case, and the xchgl accidentally (?)
> provides it; perhaps [lms]fence would give a faster memory barrier].
> More ideas on this:
> - compilers should probably now generate memory barrier instructions foe
>   volatile variables (so volatile variables would be even slower :-).  I
>   haven't seen gcc on i386's do this.
> - jhb once tried changing mtx_lolock_spin(mtx)/mtx_unlock_spin(mtx) to
>   crticial_enter()/critical_exit().  This didn't work because it broke
>   mtx_assert().  It might also not work because it removes the memory
>   barrier.  criticial_enter() only has the very weak memory barrier in
>   disable_intr() on i386's.

That was only for the UP case, in which case you don't need the membar's.  A 
single CPU always consistently sees what it has written.  The only case when 
it doesn't is for memory that can be written to by device DMA, and that 
doesn't apply to kernel data structures, esp. not to ones for scheduling, 
etc.  I actually have (untested) patches in the smpng branch to remove the 
one use of mtx_owned() (mtx_assert is not as big of a deal, that one can work 
fine by checking td_critnest) on sched_lock (the TSS munging code).  The 
problem with the [lms]fence instructions is that sfence is only one PIII+, 
and lfence is only on PIV+.  I don't recall when mfence first appeared.. 
perhaps PII?  If the lock is really expensive, then perhaps we could make 
atomic_cmpset() be actual functions (ugh) rather than inlines that did a 
branch to use foofence for PIV rather than the default.  The branches would 
suck, but it might be faster than the lock.  Of course, this would greatly 
pessimize non-PIV.

-- 
John Baldwin <jhb_at_FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org
Received on Thu May 06 2004 - 09:11:18 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:53 UTC