Re: CURRENT as gateway on not-so-fast hardware: where is a bottlneck?

From: Alexander Motin <mav_at_FreeBSD.org>
Date: Mon, 20 Aug 2012 16:32:15 +0300
On 20.08.2012 13:25, Doug Barton wrote:
> On 08/20/2012 02:59, Alexander Motin wrote:
>> On 20.08.2012 11:32, Doug Barton wrote:
>>> On 08/15/2012 03:18, Alexander Motin wrote:
>>>> On 15.08.2012 03:09, Doug Barton wrote:
>>>>> On 08/14/2012 12:20 PM, Adrian Chadd wrote:
>>>>>> Would you be willing to compile a kernel with KTR so you can capture
>>>>>> some KTR scheduler dumps?
>>>>>>
>>>>>> That way the scheduler peeps can feed this into schedgraph.py (and you
>>>>>> can too!) to figure out what's going on.
>>>>>>
>>>>>> Maybe things aren't being scheduled correctly and the added latency is
>>>>>> killing performance?
>>>>>
>>>>> You might also try switching to SCHED_ULE to see if it helps.
>>>>>
>>>>> Although, in the last few months as mav has been converging the 2 I've
>>>>> started to see the same problems I saw on my desktop systems previously
>>>>> re-appear even using ULE. For example, if I'm watching an AVI with VLC
>>>>> and start doing anything that generates a lot of interrupts (like
>>>>> moving
>>>>> large quantities of data from one disk to another) the video and sound
>>>>> start to skip. Also, various other desktop features (like menus, window
>>>>> switching, etc.) start to take measurable time to happen, sometimes
>>>>> seconds.
>>>>>
>>>>> ... and lest you think this is just a desktop problem, I've seen the
>>>>> same scenario on 8.x systems used as web servers. With ULE they were
>>>>> frequently getting into peak load situations that created what I called
>>>>> "mini thundering herd" problems where they could never quite get caught
>>>>> up. Whereas switching to 4BSD the same servers got into high-load
>>>>> situations less often, and they recovered on their own in minutes.
>>>>
>>>> It is quite pointless to speculate without real info like mentioned
>>>> above KTR_SCHED traces.
>>>
>>> I'm sorry, you're quite wrong about that. In the cases I mentioned, and
>>> in about 2 out of 3 of the cases where users reported problems and I
>>> suggested that they try 4BSD, the results were clear. This obviously
>>> points out that there is a serious problem with ULE, and if I were the
>>> one who was responsible for that code I would be looking at ways of
>>> helping users figure out where the problems are. But that's just me.
>>
>> I am not telling anything bad about 4BSD.
>
> Yes, I get that, but thanks for making it clear.
>
>> Choice is provided because
>> they are indeed different and none is perfect.
>
> ... which is why I'm asking you to stop making them more the same until
> we get a better idea of what the issues are.

I have no plans to converge them. I've just found problem in ULE, that 
was replicated into 4BSD and it would be strange to fix one without 
another. But fixing it exposed another old problem specific to 4BSD, 
which I fixed reusing logically equivalent code from ULE. I saw no 
reason to reinvent a wheel there, same as to not fix obvious bug. Sure, 
it can change behavior in some way, but ULE is not guilty.

-- 
Alexander Motin
Received on Mon Aug 20 2012 - 11:32:21 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:29 UTC