Re: CURRENT as gateway on not-so-fast hardware: where is a bottlneck?

From: Alexander Motin <mav_at_FreeBSD.org>
Date: Wed, 15 Aug 2012 13:18:05 +0300
On 15.08.2012 03:09, Doug Barton wrote:
> On 08/14/2012 12:20 PM, Adrian Chadd wrote:
>> Would you be willing to compile a kernel with KTR so you can capture
>> some KTR scheduler dumps?
>>
>> That way the scheduler peeps can feed this into schedgraph.py (and you
>> can too!) to figure out what's going on.
>>
>> Maybe things aren't being scheduled correctly and the added latency is
>> killing performance?
>
> You might also try switching to SCHED_ULE to see if it helps.
>
> Although, in the last few months as mav has been converging the 2 I've
> started to see the same problems I saw on my desktop systems previously
> re-appear even using ULE. For example, if I'm watching an AVI with VLC
> and start doing anything that generates a lot of interrupts (like moving
> large quantities of data from one disk to another) the video and sound
> start to skip. Also, various other desktop features (like menus, window
> switching, etc.) start to take measurable time to happen, sometimes
> seconds.
>
> ... and lest you think this is just a desktop problem, I've seen the
> same scenario on 8.x systems used as web servers. With ULE they were
> frequently getting into peak load situations that created what I called
> "mini thundering herd" problems where they could never quite get caught
> up. Whereas switching to 4BSD the same servers got into high-load
> situations less often, and they recovered on their own in minutes.

It is quite pointless to speculate without real info like mentioned 
above KTR_SCHED traces. Main thing I've learned about schedulers, things 
there never work as you expect. There are two many factors are relations 
to predict behavior in every case.

About Soekris and idle CPU measurement, let's start from what kind of 
eventtimer is used there. As soon as it is UP machine, I guess it uses 
i8254 timer in periodic mode. It means that it by definition can't 
properly measure load from treads running from hardclock, such as 
dummynet, polling netisr threads, etc.

What's about playing AVIs and using other GUIs, key word here and for 
ULE in general is interactivity. ULE gives huge boost to threads it 
counts interactive. Disk I/O is a good candidate for it, as it does many 
voluntary sleeps by definition, while waiting for data. If it will not 
be counted interactive, it will heavily suffer from latencies while 
waiting for other threads. Modern heavy GUIs and video CODECs same time 
may consume CPU time sequentially for long periods. On busy machines 
they may never sleep at all, trying to catchup incoming data rate. It 
can make ULE count them as batch and so less preferred then I/O. As I've 
said above, let's try to collect some real data first.

If somebody still wish area for experiments, there is always some:
  - if you want video player to not lag, set negative nice for it (ULE 
is not a magician to guess user wishes);
  - same I guess counts for Xorg process;
  - there are number of sysctls ULE provides:
    - kern.sched.interact -- value in percents specifying how much run 
time may have thread to still be counted as interactive;
    - kern.sched.slice or new kern.sched.quantum -- specifying interval 
of context switches for non-interactive threads, historically set to 
100ms. It may be too long now. Reducing it may make system run more 
smooth, while price of those switches is probably not so significant now.

-- 
Alexander Motin
Received on Wed Aug 15 2012 - 08:18:11 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:29 UTC