Re: FILEDESC_LOCK() implementation

From: Robert Watson <rwatson_at_FreeBSD.org>
Date: Wed, 21 Jun 2006 21:46:44 +0100 (BST)
On Wed, 21 Jun 2006, Paul Allen wrote:

> From Robert Watson <rwatson_at_freebsd.org>, Wed, Jun 21, 2006 at 07:46:33PM +0100:
>> I would optimize very carefully here, the trade-offs are tricky, and we may 
>> find that by making locking more complex, we cause cache problems, increase 
>> lock hold periods, etc, even if we decrease contention.  I've wondered a 
>> bit about a model where we loan fd's to threads to optimize repeated access 
>> to the same fd by the same thread, but this mostly makes sense in the 
>> context of a 1:1 model rather than an m:n model.
> I apologize for not understanding all of the uses of the FILEDESC lock but, 
> isn't the more obvious partitioning per-cpu: each cpu may allocate from a 
> range of fd, which cpu cache used depends on where the thread happens to be 
> running.  When closing a fd, it is returned to the local (possibly different 
> cpu cache).  A watermark is used to generate an IPI message to rebalance the 
> caches as needed.

The issue is actually a bit different than that.  We in effect already do the 
above using UMA.

The problem is this: when you have threads in the same process, file 
descriptor lookup is performed against a common file descriptor array.  That 
array is protected by a lock, the filedesc lock.  When lots of threads 
simultaneously perform file descriptor operations, they contend on the file 
descriptor array lock.  So if you have 30 threads all doing I/O, they are 
constantly looking up file descriptors and bumping into each other.  This is 
particularly noticeable for network workloads, where many operations are very 
fast, and so they occur in significant quantity.  The M:N threading library 
actually handles this quite well by bounding the number of threads trying to 
acquire the lock to the number of processors, but with libthr you get pretty 
bad performance.  This contention problem also affects MySQL, etc.

You can imagine a number of ways to work on this, but it's a tricky problem 
that has to be looked at carefully.

Robert N M Watson
Computer Laboratory
University of Cambridge
Received on Wed Jun 21 2006 - 18:46:45 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:57 UTC