Re: location of bioq lock

From: Poul-Henning Kamp <phk_at_phk.freebsd.dk>
Date: Tue, 12 Jul 2005 22:03:14 +0200
I must admit that I have often been tempted to move the queue+sorting
out of the drivers because they all, more or less, do the exact
same thing.

For one thing, that would simplify any ABI for changing disksort
algorithm (which should be per drive and not per system).

This would also move us to an api more or less like the network
interfaces.

What has held me back is that if the driver knows about a on-disk
queue-depth, it might be smart about when to queue and when not to.
Overall though, I don't see any evidence of this anywhere in practice
(in FreeBSD or otherwise).

The recent ATA changes have moved the queue further down (from bio
to ata requests) and preempted the discussion for now.

Scotts comments about locking are relevant, but I think they are
somewhat minor.  We have that model in the network interfaces
which often handle 10-100 times more requests per second than
disk drives, and I don't hear complaints about the model in that
context.

The last bit of this is that disksorting seldom does much for us
these days, apart from mitigating the the lemming syncer.

Finally, I am still pretty convinced that if somebody sat down and
did some real-life measurements, they would find that disk-sorting
has a different task these days where the drives have much more and
much more detailed knowledge about the physics of the situation.

Over the years I have read quite a bit of IBM's mainframe docs and
research on this topic, and they have found a lot of interesting
things which all more or less are present in todays zSeries.

Much of the work in recent years have tended to move the other
direction, instead of sorting the work before shipping to the disk,
disk state is exported so it can shape the workload.  For instance
average I/O time estimates are now used to affect block allocation
in DB2 databases.

The one place where disk-sorting _really_ makes a huge impact
is RAID5 but very specialized sorting algorithms are necessary
there and they need intimate access to the internals of the
RAID5 engine.

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk_at_FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.
Received on Tue Jul 12 2005 - 18:03:23 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:38 UTC