Re: Wow... (<-- blown away at performance)

From: Alexander Best <arundel_at_freebsd.org>
Date: Wed, 23 Feb 2011 10:24:32 +0000
On Tue Feb 22 11, Garrett Cooper wrote:
> On Tue, Feb 22, 2011 at 6:00 PM, Alexander Best <arundel_at_freebsd.org> wrote:
> > On Tue Feb 22 11, Alexander Best wrote:
> >> On Tue Feb 22 11, Alexander Best wrote:
> >> > On Tue Feb 22 11, Brandon Gooch wrote:
> >> > > On Tue, Feb 22, 2011 at 2:31 PM, Alexander Best <arundel_at_freebsd.org> wrote:
> >> > > > On Tue Feb 22 11, Garrett Cooper wrote:
> >> > > >> On Tue, Feb 22, 2011 at 2:10 AM, Eir Nym <eirnym_at_gmail.com> wrote:
> >> > > >> > On 22 February 2011 11:15, Garrett Cooper <yanegomi_at_gmail.com> wrote:
> >> > > >> >>    I don't know what to say, but r218938 screams with flash videos
> >> > > >> >> (native Linux speed). Not sure if it's the new binutils or if it's the
> >> > > >> >> new linuxulator patches, but I can run multiple instances of youtube
> >> > > >> >> in parallel (5 total with other miscellaneous flash animation) without
> >> > > >> >> it totally lagging out Firefox/X11, and it appears to close the
> >> > > >> >> instances of firefox properly now. Hopefully this version fares better
> >> > > >> >> than r218113 did (I think I hit a kernel bug after 2 weeks uptime,
> >> > > >> >> where my system just hardlocked for no apparent reason).
> >> > > >> >>    Anyhow, hope others have similar results.
> >> > > >> >> Cheers!
> >> > > >> >> -Garrett
> >> > > >> >>
> >> > > >> >> $ uname -a
> >> > > >> >> FreeBSD bayonetta.local 9.0-CURRENT FreeBSD 9.0-CURRENT #0 r218938M:
> >> > > >> >> Mon Feb 21 23:10:51 PST 2011
> >> > > >> >> gcooper_at_bayonetta.local:/usr/obj/usr/src/sys/BAYONETTA  amd64
> >> > > >> >
> >> > > >> > Which FlashPlayer version do you test? Adobe has made significant
> >> > > >> > performance changes in 10.2 (from 10.1). You can search for StageVideo
> >> > > >> > performance to learn more about. Youtube already use them since 10.2
> >> > > >> > beta
> >> > > >>
> >> > > >>     linux-f10-flashplugin-10.1r102.65 . The performance increases are
> >> > > >> claimed to be "up to 85%" on the Stage Video site, but I'm seeing a
> >> > > >> more than 200% increase (now it actually scales between multiple
> >> > > >> instances, instead of croaks with one instance, tiling up and down the
> >> > > >> screen when moving the window slider for instance or switching tabs).
> >> > > >> Besides, it seems like it needs external support from the video
> >> > > >> driver, and I'm not sure that that bridge exists in the linuxulator.
> >> > > >>     Also, it looks like npviewer.bin still hangs to resources on until
> >> > > >> Firefox closes (or I kill it :)..), so something still needs to be
> >> > > >> resolved there, but that isn't a regression (it's acted that way for
> >> > > >> ages), and shouldn't be too hard to do.
> >> > > >
> >> > > > i think the problem with npviewer.bin having to be killed by hand is a futex
> >> > > > issue in combination with a high number of threads. this is the output of a
> >> > > > test from darren hart's futex test suite under freebsd 9.0:
> >> > > >
> >> > > > futex_wait: Measure FUTEX_WAIT operations per second
> >> > > >        Arguments: iterations=100000000 threads=1
> >> > > > Result: 18622 Kiter/s
> >> > > > futex_wait: Measure FUTEX_WAIT operations per second
> >> > > >        Arguments: iterations=100000000 threads=2
> >> > > > Result: 15469 Kiter/s
> >> > > > futex_wait: Measure FUTEX_WAIT operations per second
> >> > > >        Arguments: iterations=100000000 threads=3
> >> > > > Result: 12713 Kiter/s
> >> > > > futex_wait: Measure FUTEX_WAIT operations per second
> >> > > >        Arguments: iterations=100000000 threads=4
> >> > > > Result: 12247 Kiter/s
> >> > > > futex_wait: Measure FUTEX_WAIT operations per second
> >> > > >        Arguments: iterations=100000000 threads=5
> >> > > > Result: 11814 Kiter/s
> >> > > > futex_wait: Measure FUTEX_WAIT operations per second
> >> > > >        Arguments: iterations=100000000 threads=6
> >> > > > Result: 13468 Kiter/s
> >> > > > futex_wait: Measure FUTEX_WAIT operations per second
> >> > > >        Arguments: iterations=100000000 threads=8
> >> > > > Result: 12061 Kiter/s
> >> > > > futex_wait: Measure FUTEX_WAIT operations per second
> >> > > >        Arguments: iterations=100000000 threads=10
> >> > > > Result: 12854 Kiter/s
> >> > > > futex_wait: Measure FUTEX_WAIT operations per second
> >> > > >        Arguments: iterations=100000000 threads=12
> >> > > > Result: 12999 Kiter/s
> >> > > > futex_wait: Measure FUTEX_WAIT operations per second
> >> > > >        Arguments: iterations=100000000 threads=16
> >> > > > Result: 12402 Kiter/s
> >> > > > futex_wait: Measure FUTEX_WAIT operations per second
> >> > > >        Arguments: iterations=100000000 threads=24
> >> > > > Result: 9815 Kiter/s
> >> > > > futex_wait: Measure FUTEX_WAIT operations per second
> >> > > >        Arguments: iterations=100000000 threads=32
> >> > > > Result: 8518 Kiter/s
> >> > > > futex_wait: Measure FUTEX_WAIT operations per second
> >> > > >        Arguments: iterations=100000000 threads=64
> >> > > >        ERROR: Resource temporarily unavailable: pthread_create
> >> > > > Result: ERROR
> >> > > > futex_wait: Measure FUTEX_WAIT operations per second
> >> > > >        Arguments: iterations=100000000 threads=128
> >> > > >        ERROR: Resource temporarily unavailable: pthread_create
> >> > > > Result: ERROR
> >> > > > futex_wait: Measure FUTEX_WAIT operations per second
> >> > > >        Arguments: iterations=100000000 threads=256
> >> > > >        ERROR: Resource temporarily unavailable: pthread_create
> >> > > > Result: ERROR
> >> > > > futex_wait: Measure FUTEX_WAIT operations per second
> >> > > >        Arguments: iterations=100000000 threads=512
> >> > > >        ERROR: Resource temporarily unavailable: pthread_create
> >> > > > Result: ERROR
> >> > > > futex_wait: Measure FUTEX_WAIT operations per second
> >> > > >        Arguments: iterations=100000000 threads=1024
> >> > > >        ERROR: Resource temporarily unavailable: pthread_create
> >> > > > Result: ERROR
> >> > > >
> >> > > > cheers.
> >> > > > alex
> >> > >
> >> > > Have you found any method or workaround to mitigate this issue?
> >>
> >> i did a ktrace -i and linux_kdump run and the results are quite interesting.
> >> with only 1 thread linux_sys_futex() *never* failed with EAGAIN. starting
> >> with 2 threads however i'm seeing an increasing number of linux_sys_futex()
> >> failures (all with EAGAIN).
> >>
> >> the overhead is quite massive. linux_kdump > /tmp/futex-failure produced 941
> >> megs of output. doing
> >> grep -v "futex_wait" futex-failure > futex-grep
> >> produces an output of 192K!! so it's quite obvious that the futex_wait stuff
> >> is responsible for causing massive activity, due to the constant failures and
> >> retries.
> >
> > i ran kdump, instead of linux_kdump and it seems the freebsd function returning
> > EAGAIN is nanosleep(2):
> >
> > 929 futex_wait CALL  nanosleep(0xffffdb5c,0x81,0x1,0,0,0)
> >  9929 futex_wait RET   nanosleep 0
> >  9929 futex_wait CALL  nanosleep(0xffffdb5c,0x81,0x1,0,0,0)
> >  9928 futex_wait CALL  nanosleep(0xffffdb5c,0x80,0x2,0,0,0)
> >  9929 futex_wait RET   nanosleep 0
> >  9928 futex_wait RET   nanosleep -1 errno 35 Resource temporarily unavailable
> >  9929 futex_wait CALL  nanosleep(0xffffdb5c,0x81,0x1,0,0,0)
> >  9928 futex_wait CALL  nanosleep(0xffffdb5c,0x80,0x2,0,0,0)
> >  9929 futex_wait RET   nanosleep 0
> >  9928 futex_wait RET   nanosleep -1 errno 35 Resource temporarily unavailable
> >  9929 futex_wait CALL  nanosleep(0xffffdb5c,0x80,0x2,0,0,0)
> >  9928 futex_wait CALL  nanosleep(0xffffdb5c,0x81,0x1,0,0,0)
> >  9929 futex_wait RET   nanosleep -1 errno 35 Resource temporarily unavailable
> >  9928 futex_wait RET   nanosleep 0
> >  9929 futex_wait CALL  nanosleep(0xffffdb5c,0x81,0x1,0,0,0)
> >  9929 futex_wait RET   nanosleep 0
> >  9929 futex_wait CALL  nanosleep(0xffffdb5c,0x80,0x2,0,0,0)
> >  9928 futex_wait CALL  nanosleep(0xffffdb5c,0x81,0x1,0,0,0)
> >  9929 futex_wait RET   nanosleep -1 errno 35 Resource temporarily unavailable
> >  9928 futex_wait RET   nanosleep 0
> >  9929 futex_wait CALL  nanosleep(0xffffdb5c,0x81,0x1,0,0,0)
> >  9929 futex_wait RET   nanosleep 0
> >  9928 futex_wait CALL  nanosleep(0xffffdb5c,0x81,0x1,0,0,0)
> >  9929 futex_wait CALL  nanosleep(0xffffdb5c,0x80,0x2,0,0,0)
> >  9928 futex_wait RET   nanosleep 0
> >  9929 futex_wait RET   nanosleep -1 errno 35 Resource temporarily unavailable
> >  9929 futex_wait CALL  nanosleep(0xffffdb5c,0x80,0x2,0,0,0)
> >  9928 futex_wait CALL  nanosleep(0xffffdb5c,0x81,0x1,0,0,0)
> >  9929 futex_wait RET   nanosleep -1 errno 35 Resource temporarily unavailable
> >
> > so this seems to be the location in linux_futex.c where error gets set to
> > EAGAIN:
> >
> > static int
> > futex_sleep(struct futex *f, struct waiting_proc *wp, int timeout)
> > {
> >        int error;
> >
> >        FUTEX_ASSERT_LOCKED(f);
> >        LINUX_CTR4(sys_futex, "futex_sleep enter uaddr %p wp %p timo %d ref %d",
> >            f->f_uaddr, wp, timeout, f->f_refcount);
> >        error = sx_sleep(wp, &f->f_lck, PCATCH, "futex", timeout);
> >        ^^^^
> >
> > according to nanosleep(2) however EAGAIN is not a valid return value...i'm
> > lost. :(
> 
>     Don't trust the manpage young padawan (I wish one could, but not
> all manpages are updated in a timely manner, in particular the kernel
> ones I've discovered :(...). The one spot where it could be completely
> screwing things up is with timer_create (it's the one logical spot in
> sys/kern/... that returns EAGAIN), and this might make sense if our
> clock values weren't being translated over properly for the
> linuxulator and clocks were getting screwed up somewhere.
>     Well, that and clock_nanosleep, isn't implemented, and I wouldn't
> be so sure if it's sane when compared to what Linux's clocks/timers do
> (sane being equivalent, not necessarily right).
>     Something I ranted about in another PR:
> http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/150095 .
>     So, that might be something worth looking into or not. It's up to
> you *shrugs*.

thanks a lot for the help. i'll defenately check that one out. i also found
maybe another cause for the problems via google. it might be that the futex
code or maybe linux threads in general are using PTHREAD_SCOPE_SYSTEM and not
PTHREAD_SCOPE_PROCESS. using PTHREAD_SCOPE_PROCESS no thread limits seem to
apply. PTHREAD_SCOPE_SYSTEM on the other hand seems to be limited by two
sysctl's:

kern.threads.max_threads_per_proc
kern.threads.max_groups_per_proc

the latter one however doesn't exist anymore in 9.0 CURRENT and i can't find
any hint about what happend to it.

this would also make sense, since kern.threads.max_groups_per_proc used to be
preset to 50 and if you have a look at the testcase you'll see that with 32
threads the test succeeds, while with 64 threads it fails.

cheers.
alex

> HTH,
> -Garrett

-- 
a13x
Received on Wed Feb 23 2011 - 09:24:32 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:11 UTC