Re: speeding up ugen by an order of magnitude.

From: Bernd Walter <ticso_at_cicely12.cicely.de> Date: Wed, 7 Jul 2004 22:12:17 +0200 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:00 UTC

On Wed, Jul 07, 2004 at 11:22:15AM -0700, Julian Elischer wrote:
> 
> 
> On Wed, 7 Jul 2004, Bernd Walter wrote:
> 
> > On Tue, Jul 06, 2004 at 04:32:28PM -0700, Julian Elischer wrote:
> > > 
> > > So, we had a device that we access through ugen.
> > > 
> > > the manufacturer said we should get the transaction in 3 seconds 
> > > and wiindows and linux did, but FreeBSD got it in 15 seconds.
> > > I suspect since the code is the same, NetBSD would get the same result..
> > > 
> > > lokking at it I noticed that ugen does everything in 1K bits,
> > > which is ok for USB1, but a bit silly for USB2.
> > 
> > Mmm - 128k is very big - consider that you may have hundrets of
> > ugen bulk pipes open - all with outstanding reads.
> > This would eat up kernel memory quite fast.
> 
> Yes that's why I brought it up here.
> That was only a "proof-of-concept" that showed that teh slowdown was
> coming from the fact that the ugen was only pushing
> one KB of data per frame (1mSec) interrupt. (it was pinned at 1MB/sec)
> The correct answer may be to do what pould-henning suggested and use teh
> physio facility to do this. I considerred that originally but there is
> overhead in that too, and it is  also possible that the NetBSD and
> FreeBSD physio facilities have diverged enough to make this non trivial
> as far as keeping diffs to a minimum. (It is after all NetBSD code).

Well - as I already wrote - I can't argue with physio.
What are the pros?

> It would be possibel to modulate teh malloc per xfer by only malloccing
> a buffer large enough for the transfer, but the malloc per transfer
> seems a lot of overhead. Alternatively it may be possible to malloc once
> for every endpoint on teh client device, but I'm not clear on whether it
> is possible to have multiple outstanding requests per endpoint. If so
> then how many buffers DO we malloc, and how big? What if there are a lot
> of endpoints?

You can have multiple xfers per endpoint - each endpoint has a queue
for it.
In fact a 128k xfer may be split by the controller driver if the
memory is physicxally too fragmented.
The controller will schedule requests to the bus as long as there is
any outstanding request.
More precisely the controller driver disable processing an endpoint
to modify the queue.
Maybe you could reach a similar speed gain by interleaving two 8k
xfers.

> > The problem is lost bus time between finishing an xfer and issuing
> > the next one - consider that part of this lost time is OS dependend
> > latency and in fact might be limited to FreeBSD.
> 
> No, the lost time is I believe due to something in the way that the ehci
> driver is setting things up. I believe from empirical evidence that 
> we are getting one transfer per frame (8 uframes). I don't believe OS
> response time is a key factor. 
> 
> 
> > 
> > What about those options:
> > - limit the allocated memory to the user request so we don't take the
> >   whole 128k if not reuired.
> 
> this requires a separate malloc per xfer. it;s a valid option but 
> is it acceptable?

Why - if someone is doing 160k then mallocing 128k can be done and
used twice - unless interleaving is done of course.
The win is if an application only does 64 bytes or 8 bytes.
64 bytes is the maximum allowed packet size for bulk endpoints and
small requests are not uncommon if the application needs control about
every packet.

I'm producing rs485 devices that do xfers in the 8-258 byte range.
Most xfers are only 10 bytes.
Well I did a special kernel driver for it.

> > - Do interleaving with 2 or more xfers if the read request is known to
> >   take more xfers.
> 
> You still need to malloc enough data for all outstanding data.
> better to do it with ONE request than multiple..

Interleaving with two 8k xfers would only require 16k.
Why do you think that one xfer is better?
On the USB it's done with 64 byte or, depending on the device, even
much smaller packets - the difference is just communcation with the
host controller.

-- 
B.Walter                   BWCT                http://www.bwct.de
bernd_at_bwct.de                                  info_at_bwct.de