Re: [poll / rfc] kdb_stop_cpus

From: Andriy Gapon <avg_at_FreeBSD.org>
Date: Tue, 07 Jun 2011 17:14:29 +0300
on 05/06/2011 01:35 Attilio Rao said the following:
> 2011/6/4 Andriy Gapon <avg_at_freebsd.org>:
>> commit 458ebd9aca7e91fc6e0825c727c7220ab9f61016
>>
>>    generic_stop_cpus: move timeout detection code from under DIAGNOSTIC
>>
>>    ... and also increase it a bit.
>>    IMO it's better to detect and report the (rather serious) condition and
>>    allow a system to proceed somehow rather than be stuck in an endless
>>    loop.
>>
>> diff --git a/sys/kern/subr_smp.c b/sys/kern/subr_smp.c
>> index ae52f4b..4bd766b 100644
>> --- a/sys/kern/subr_smp.c
>> +++ b/sys/kern/subr_smp.c
>> _at__at_ -232,12 +232,10 _at__at_ generic_stop_cpus(cpumask_t map, u_int type)
>>                /* spin */
>>                cpu_spinwait();
>>                i++;
>> -#ifdef DIAGNOSTIC
>> -               if (i == 100000) {
>> +               if (i == 100000000) {
>>                        printf("timeout stopping cpus\n");
>>                        break;
>>                }
>> -#endif
>>        }
>>
>>        stopping_cpu = NOCPU;
> 
> I'd also add the ability, once the deadlock is detected, to break in
> KDB, and put that under DIAGNOSTIC.
> I had such a patch and I used it to debug some deadlocks on shutdown
> code, but now it seems I can't find it anymore.

I think that this could be useful.
Of course, it would have to honor KDB_UNATTENDED.
However, I am not sure how to implement it safely.  E.g. panic() should stop other
CPUs before setting panicstr and if some CPU is stuck for good, then we would just
be recursively calling panic() until triple-fault.  Ditto for kdb_trap().

So if you could dig up your code for implementing this that would be useful.

-- 
Andriy Gapon
Received on Tue Jun 07 2011 - 12:14:33 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:14 UTC