Re: [RFC] Start SMP subsystem earlier

From: John Baldwin <jhb_at_FreeBSD.org>
Date: Tue, 06 Jan 2015 11:09:59 -0500
On 1/6/15 10:55 AM, Ian Lepore wrote:
> On Tue, 2015-01-06 at 09:37 -0500, John Baldwin wrote:
>> On 1/5/15 8:18 AM, Hans Petter Selasky wrote:
>>> Hi,
>>>
>>> There is a limitiation on the number of interrupt vectors available when
>>> only a single processor is running. To have more interrupts available we
>>> need to start SMP earlier when building a monotolith kernel and not
>>> loading drivers as modules. The driver in question is a network driver
>>> and because it cannot be started after SI_SUB_ROOT_CONF due to PXE
>>> support I see no other option than to move SI_SUB_SMP earlier.
>>>
>>> Suggested patch:
>>>
>>>> [...]
>>>
>>> This fixes a problem for Mellanox drivers in the OFED layer. Possibly we
>>> need to move the SMP even earlier to not miss the generic FreeBSD PCI
>>> device enumeration or maybe this is not possible. Does anyone know how
>>> early we can start SMP?
>>
>> We need a lot more work before this is ready.  This is one of the goals
>> of the multipass new-bus stuff.  In particular, we have to enumerate
>> enough devices to bring event timer hardware up so that timer interrupts
>> work so that tsleep() will actually sleep.  In addition, we also need
>> idle threads created and working before APs are started as otherwise
>> they will have no thread to run initially.  This is certainly a desired
>> feature, but it is not as simple as moving the sysinit up I'm afraid.
>>
> 
> Just an FYI, the ARM world is now using the multipass newbus stuff.  It
> works well, with some quirks...
> 
> The predefined pass names don't always makes sense for the arm world.
> There aren't enough predefined pass names and even though the number
> space for them is 4 billion wide all the predefined names are in the
> range < 100 and separated by only 10 so it's tricky to wedge things
> between the existing names.
> 
> The strangest bit is when you have interdependent drivers at different
> early pass numbers.  Sometimes it's necessary to do almost nothing in
> the attach() routine and do all the real attach-time type stuff in a
> bus_new_pass() routine after the pass number becomes high enough that
> your co-dependent driver peers are available.

Yes, I almost want another downcall through the tree that is something
like 'bus_pass_completed', though the original design was to override
bus_new_pass as you have done.  And yes, in many cases the logic needs
to move out of attach.  The pci bus will end up only doing enumeration
but no resource assignment in its attach routine once things are fleshed
out more for example.  However, for now I've found that even on x86 I've
had to add a new pass level for ACPI and some other things like
acpi_sysresource. :(  It almost wants more of a provides-requires setup
than hardcoded pass levels, but that's more complicated to implement.

-- 
John Baldwin
Received on Tue Jan 06 2015 - 15:10:04 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:54 UTC