Re: speeding up ugen by an order of magnitude.

From: Julian Elischer <julian_at_elischer.org> Date: Wed, 7 Jul 2004 11:22:15 -0700 (PDT) · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:00 UTC

On Wed, 7 Jul 2004, Bernd Walter wrote:

> On Tue, Jul 06, 2004 at 04:32:28PM -0700, Julian Elischer wrote:
> > 
> > So, we had a device that we access through ugen.
> > 
> > the manufacturer said we should get the transaction in 3 seconds 
> > and wiindows and linux did, but FreeBSD got it in 15 seconds.
> > I suspect since the code is the same, NetBSD would get the same result..
> > 
> > lokking at it I noticed that ugen does everything in 1K bits,
> > which is ok for USB1, but a bit silly for USB2.
> 
> Mmm - 128k is very big - consider that you may have hundrets of
> ugen bulk pipes open - all with outstanding reads.
> This would eat up kernel memory quite fast.

Yes that's why I brought it up here.
That was only a "proof-of-concept" that showed that teh slowdown was
coming from the fact that the ugen was only pushing
one KB of data per frame (1mSec) interrupt. (it was pinned at 1MB/sec)
The correct answer may be to do what pould-henning suggested and use teh
physio facility to do this. I considerred that originally but there is
overhead in that too, and it is  also possible that the NetBSD and
FreeBSD physio facilities have diverged enough to make this non trivial
as far as keeping diffs to a minimum. (It is after all NetBSD code).

It would be possibel to modulate teh malloc per xfer by only malloccing
a buffer large enough for the transfer, but the malloc per transfer
seems a lot of overhead. Alternatively it may be possible to malloc once
for every endpoint on teh client device, but I'm not clear on whether it
is possible to have multiple outstanding requests per endpoint. If so
then how many buffers DO we malloc, and how big? What if there are a lot
of endpoints?

> 
> The problem is lost bus time between finishing an xfer and issuing
> the next one - consider that part of this lost time is OS dependend
> latency and in fact might be limited to FreeBSD.

No, the lost time is I believe due to something in the way that the ehci
driver is setting things up. I believe from empirical evidence that 
we are getting one transfer per frame (8 uframes). I don't believe OS
response time is a key factor. 

> 
> What about those options:
> - limit the allocated memory to the user request so we don't take the
>   whole 128k if not reuired.

this requires a separate malloc per xfer. it;s a valid option but 
is it acceptable?

> - Do interleaving with 2 or more xfers if the read request is known to
>   take more xfers.

You still need to malloc enough data for all outstanding data.
better to do it with ONE request than multiple..

> 
> Naturally the situation with bulk writes is the same.

yes.