Re: [RFC] Start SMP subsystem earlier

From: Hans Petter Selasky <hps_at_selasky.org>
Date: Mon, 05 Jan 2015 15:08:49 +0100
On 01/05/15 14:43, Konstantin Belousov wrote:
> On Mon, Jan 05, 2015 at 02:18:17PM +0100, Hans Petter Selasky wrote:
>> Hi,
>>
>> There is a limitiation on the number of interrupt vectors available when
>> only a single processor is running. To have more interrupts available we
>> need to start SMP earlier when building a monotolith kernel and not
>> loading drivers as modules. The driver in question is a network driver
>> and because it cannot be started after SI_SUB_ROOT_CONF due to PXE
>> support I see no other option than to move SI_SUB_SMP earlier.
>>
>> Suggested patch:
>>
>>> Index: sys/kernel.h
>>> ===================================================================
>>> --- sys/kernel.h	(revision 276691)
>>> +++ sys/kernel.h	(working copy)
>>> _at__at_ -152,6 +152,7 _at__at_
>>>   	SI_SUB_KPROF		= 0x9000000,	/* kernel profiling*/
>>>   	SI_SUB_KICK_SCHEDULER	= 0xa000000,	/* start the timeout events*/
>>>   	SI_SUB_INT_CONFIG_HOOKS	= 0xa800000,	/* Interrupts enabled config */
>>> +	SI_SUB_SMP		= 0xa850000,	/* start the APs*/
>>>   	SI_SUB_ROOT_CONF	= 0xb000000,	/* Find root devices */
>>>   	SI_SUB_DUMP_CONF	= 0xb200000,	/* Find dump devices */
>>>   	SI_SUB_RAID		= 0xb380000,	/* Configure GEOM classes */
>>> _at__at_ -165,7 +166,6 _at__at_
>>>   	SI_SUB_KTHREAD_BUF	= 0xea00000,	/* buffer daemon*/
>>>   	SI_SUB_KTHREAD_UPDATE	= 0xec00000,	/* update daemon*/
>>>   	SI_SUB_KTHREAD_IDLE	= 0xee00000,	/* idle procs*/
>>> -	SI_SUB_SMP		= 0xf000000,	/* start the APs*/
>>>   	SI_SUB_RACCTD		= 0xf100000,	/* start racctd*/
>>>   	SI_SUB_LAST		= 0xfffffff	/* final initialization */
>>>   };
> Did you inspected all reordered sysinit routines and ensured that the
> reordering is safe ?  At very least, SUB_SMP starts event timers,
> while KTHREAD_IDLE is about configuring some hardware which might
> be required/not ready for that.

Hi,

I did not inspect everything myself yet regarding this change. That's 
why I'm sending this e-mail out. The problem is simply that the total 
number of interrupts appears to be limited by "APIC_NUM_IOINTS" and 
"NUM_IO_INTS" which is per CPU from what I understand. Until SMP is 
activated the newbus code is simply distributing the IRQ vectors on the 
available IRQs, then when SMP is up it is re-shuffling them all.

I was initially thinking that a hack might be possible, like using 
RF_SHARED for the IRQ resource, but then noticed that we were using MSI 
interrupts, which are not allocated in the same manner.

The other issue is that the IRQs should be functional too, so that PXE 
boot can work.

--HPS

>
>>
>> This fixes a problem for Mellanox drivers in the OFED layer. Possibly we
>> need to move the SMP even earlier to not miss the generic FreeBSD PCI
>> device enumeration or maybe this is not possible. Does anyone know how
>> early we can start SMP?
>
> _______________________________________________
> freebsd-current_at_freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"
>
Received on Mon Jan 05 2015 - 13:08:04 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:54 UTC