Re: location of bioq lock

From: Scott Long <scottl_at_samsco.org>
Date: Tue, 12 Jul 2005 13:28:02 -0600
Luigi Rizzo wrote:

> scott, i probably did not provide enough context...
> 
> To reassure you, I dont intend to change the status quo - the
> queue is owned by the driver, and the locking scheme remains the same.
> 
> What we are trying to do is abstract the disk scheduler API so that
> if a user, or a subsystem, or a piece of hardware, could benefit
> from a different scheduler than the default, it can be easily plugged in.
> 
>     In RELENG_4 and RELENG_5 most drivers use bioq_disksort(), which
>     optimizes for throughput, but may not be ideal when apps have      
>     even soft real-time requirements (e.g. media players), plus
>     there are different approaches (e.g. the anticipatory scheduling
>     that someone referenced) that could prove more effective. Stuff     
>     like RAID drivers might have different request ordering
>     requirements. Even the ATA driver in HEAD uses a different 
>     scheduler than bioq_disksort, but for lack of a proper API
>     that's hardwired in the driver.
>     And then as you say, the hardware might have an intelligent 
>     controller so it might be worthwhile disabling sorting - but
>     then, the same driver e.g.  SCSI or ATA might talk to differently   
>     'smart' hardware and so having configurable per-device schedulers  
>     might be useful.
> 
> Now, if one sees the disk scheduler as a boot-time option, then
> the bioq_*() functions are all one needs - the API calls assume
> that the subsystem is already locked so nobody except the driver
> needs to know about the lock.
> 
> However, if one wants to be able to switch schedulers at runtime
> (e.g. through a sysctl), then at the time of a switch you may need to
> move requests from the old scheduler to the new one, and in the 
> process you have to lock each queue before playing with it.
> 
> So the issue is _not_ changing the locking scheme, but just making
> the sysctl handler aware of the address (and maybe type) of the lock.
> 
> Here are the two ways that I suggested - 1) put the lock in the queue
> so its address is implicitly known, or 2) pass it as an additional
> argument to bioq_init(). Either way, when the sysctl handler needs
> to play with the bioq outside the normal requests issued by
> the driver, it knows which lock to grab.  
> 
> I am totally with you when say that a single lock covering not
> just the bioq is more efficient - this seems to push towards
> method #2, which overall is more flexible.
> 
> 	cheers
> 	luigi
> 

Ah, now I understand.  The downside to exporting the lock is that
it opens the possibility of a layer outside of the driver holding
the lock long enough to do undesirable things like delay interrupt
processing.  Also, allowing an outside layer to peek at the bioq
contents breaks the assumption that most drivers have that they own
the queue and can handle it as they see fit.  It's often times
desirable to look at the head of the bioq but not dequeue the
bio object until you know for sure that it'll be delivered to the
hardware.  Also, what about in-flight bio's?  It's up to the driver
to decide whether to complete or requeue bio's that might have been
defered due to lack of resources or a transient errors.  How will
the action of re-ordering the queue from an outside layer handle
this?  And finally, if you do export a lock from the driver, you
cannot assume that it'll be a sleep mutex.  You'll need to handle
the possibility of a spinlock, sx lock, semaphore, etc.

An alternate approach that I would suggest is to have the disk scheduler
freeze the block layer from delivering any new bio's while waiting for
all of the outstanding bio's to complete, then flip the scheduler
algorithm and allow i/o delivery to resume.  That way there is no
need to play with driver locks, no need to rummage around in resources
that are private to the driver, and no need to worry about in-flight
bio's.  It also removes the need to touch every driver with an API
change.

Scott
Received on Tue Jul 12 2005 - 17:28:06 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:38 UTC