Re: umass(4)/uhci(4) REALLY slow

From: Bruce Evans <bde_at_zeta.org.au> Date: Wed, 1 Oct 2003 15:00:34 +1000 (EST) · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:24 UTC

On Tue, 30 Sep 2003, Nate Lawson wrote:

> Here are "iostat 5" results for my USB thumb drive on a uhci(4) controller
> with 5.1-CURRENT.  On windows on the same box, it runs reasonably quickly.
> On FreeBSD, it really lags.  This is for a cp of a large file to a
> msdosfs-mounted flash drive.
>
>      da0
>   KB/t tps  MB/s
>   1.07  41  0.04
>   1.00  41  0.04
>   1.02  41  0.04
>
> Is there something we're doing on uhci(4) that makes each transfer only
> 1 KB?  If we upped it to 32 KB, it would be a more reasonable 1.2 MB/sec
> which is still well under the USB 1.1 max speed.

This is probably due to something we're not doing in msdosfs.  1K is
probably your msdosfs file system block size.  msdosfs is missing support
for clustering.  None of the lower levels (buffer cache, driver, usb)
in FreeBSD does clustering (the buffer cache has some support for it,
but this is mostly turned off because the file system doesn't ask for
it).  The lower levels not in FreeBSD (firmware and hardware) apparently
don't do clustering either.  This results in abysmal performance if
the msdosfs block size is small.  It would be twice as abysmal with
the minimum block size of 512.  Similarly for ffs with small block sizes
and lots of fragments if write clustering is turned off if the drive
doesn't do it.

My early model SCSI ZIP100 drive gave similar performance (command
overhead of about 25 msec = 40 tps).  My not so early model ATA ZIP100
drive now does 230 tps; its tps is almost independent of the block
size for block sizes <= 4K, so its performance is reduced by a factor
of "only" up to 8 by using small block sizes.

The buffer cache also handles small block sizes poorly.  If nbuf is
2048, then a whole 1MB of data can be in the buffer cache for a file
system with a block size of 512.  Using such a file system will soon (*)
use most buffers for tinygrams and deplete the buffer cache for other
file systems.  However, disks will normally stay cached in VMIO buffers,
so this only thrashes the disk caches in memory, so it is now worse than
copying all the data several times per access.

(*) Only RSN with 41 tps :-).

Bruce