Re: ATAng interrupt storm and bogus interrupt stormproofing

From: Bruce Evans <bde_at_zeta.org.au> Date: Sat, 13 Sep 2003 17:29:57 +1000 (EST) · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:22 UTC

On Tue, 9 Sep 2003, Bruce Evans wrote:

> ATAng generates an interrupt storm deterministically on 1 system here.
> This should lock up the system, but bugs in interrupt handler scheduling
> prevent the lockup except in my version of -current where the bugs are
> different.
> ...
> Bugs in interrupt handler scheduling:
> Ignoring level-sensitive interrupts should result in their thread being
> switched to endlessly (except for switches to higher priority threads
> and possibly for switches to equal priority threads).  But there is
> some bug in interrupt handler scheduling that results in low priority
> threads running while the high priority interrupt storm thread is
> runnable.  This allows the thread that issued the ATA_ATAPI_IDENTIFY
> command to run eventually, and it or another thread clears the storm
> as a side effect of issuing another command.
>
> I suspect that the scheduling bug is related to the old one of just
> setting the TDF_NEEDRESCHED flag when a switch can't be done
> immediately.  TDF_NEEDRESCHED has no effect on kernel threads since
> it is only examined on return to user mode.
> ...

The scheduling bug is indeed related to that.  Here is a quick fix.  See
the too-large comments in the last hunk of it for details.  The early
return for the interrupts case is older code.  It is more needed and
perhaps essential witht the main part of the patch.  The debugging code
should not report any problems now.  This depends on a recently committed
fix to stop ddb calling cpu_unpend() via DELAY().

%%%
Index: critical.c
===================================================================
RCS file: /home/ncvs/src/sys/i386/i386/critical.c,v
retrieving revision 1.11
diff -u -2 -r1.11 critical.c
--- critical.c	12 Aug 2003 23:24:04 -0000	1.11
+++ critical.c	12 Sep 2003 23:07:36 -0000
_at__at_ -37,4 +37,5 _at__at_
 #include <sys/pcpu.h>
 #include <sys/proc.h>
+#include <sys/resourcevar.h>
 #include <sys/sysctl.h>
 #include <sys/ucontext.h>
_at__at_ -56,4 +57,6 _at__at_
 #endif

+#include <ddb/ddb.h>
+
 void i386_unpend(void);		/* NOTE: not static, called from assembly */

_at__at_ -67,8 +70,32 _at__at_
 	register_t eflags;
 	struct thread *td;
+#ifdef DDB
+	static int traced;
+
+	if (db_active && !traced) {
+		traced = 1;
+		printf("cpu_unpend: naughty db\n");
+		backtrace();
+	}
+#endif
+
+	/*
+	 * CPU interrupts can be masked here (e.g., if we are called (nested)
+	 * from code that does "intr_disable(); critical_enter(); ...
+	 * critical_exit();"), and interrupts can even be pending then (e.g.,
+	 * if we are called (nested) from Xintr* as part of ICU locking in
+	 * the SMP case after Xintr* has interrupted critical_enter() after
+	 * the latter decremented td->td_intr_nesting_level to 0 but before it
+	 * called us).  If so, don't unpend interrupts now.
+	 */
+	if ((read_eflags() & PSL_I) == 0)
+		return;

 	td = curthread;
+more:
 	eflags = intr_disable();
 	if (PCPU_GET(int_pending)) {
+		if (td->td_intr_nesting_level != 0)
+			Debugger("cpu_unpend: intr_nesting_level != 0");
 		++td->td_intr_nesting_level;
 		i386_unpend();
_at__at_ -76,4 +103,50 _at__at_
 	}
 	intr_restore(eflags);
+
+	/*
+	 * We kept td_critnest > 0 in i386_unpend() to prevent context
+	 * switches away from us.  This results in ithread_schedule()
+	 * being called in an abnormal environment so it doesn't switch
+	 * context to hardware interrupt handlers in the normal way.  It
+	 * sets TDF_NEEDRESCHED, but that is only intended for use in
+	 * ast().  Use it here too to ensure running the highest priority
+	 * thread.
+	 *
+	 * Prenting context switches away from us is probably not needed
+	 * except possibly as an optimization.  We have to do it if we
+	 * use the int_pending lock since it prevents other threads
+	 * doing any unpending, but letting the other threads do the
+	 * unpending should be safe if we are careful enough.  The calls
+	 * to i386_unpend() from Xfastintr* need special care since they
+	 * also have interrupts masked in the CPU.  Our check of
+	 * TDF_NEEDRESCHED doesn't help for these calls.
+	 *
+	 * td_critnest > 0 also prevents swi_sched() actually switching
+	 * context.  It is normal and probably even correct for swi_sched()
+	 * to not switch immediately, but this shows that swi_sched()
+	 * is broken as designed.  It should just schedule the scheduling
+	 * of a SWI; actual scheduling should not occur until a switch is
+	 * possible.  swi_sched() inherits the scheduling bug from
+	 * ithread_schedule() -- in cases where a switch is not possible,
+	 * it sets TDF_NEEDRESCHED but only ast() and now here checks this
+	 * flag.
+	 *
+	 * XXX don't switch when cold -- can't get back then.  Interrupts
+	 * can happen when cold, so there is another problem getting them
+	 * handled as soon as we become warm.
+	 */
+	if (td->td_flags & TDF_NEEDRESCHED && !cold) {
+		mtx_lock_spin(&sched_lock);
+		if (td->td_flags & TDF_IDLETD)
+			td->td_state = TDS_CAN_RUN; /* XXXKSE */
+		td->td_proc->p_stats->p_ru.ru_nivcsw++;
+		mi_switch();
+		/*
+		 * Can't do mtx_unlock_spin() here due to recursion
+		 * problems.
+		 */
+		td->td_critnest = 0;
+		goto more;
+	}
 }

%%%

Please finish this.  My version of -current uses a completely different
version of unpending in which this bug was fixed accidentally a few
months ago.  It has different bugs (mainly incomplete support for the
SMP case, and a problem with nesting).  It doesn't need to handle fast
interrupts because fast interrupts are actually fast, so they are not
blocked by spinlocks or critical_enter() and never become pending.

A previous version of the above had an interesting problem with nesting.
It used mtx_unlock_spin() instead of "td->td_critnest = 0; goto more;".
This caused endless recursion if there was an interrupt storm.

My version:

%%%
void
unpend(void)
{
	struct thread *td;
	static int nestcount;
	int bit, irq;

	if ((read_eflags() & PSL_I) == 0) {
#if 0
		static int warned;

		if ((ipending != 0 || spending != 0) && !warned) {
			/*
			 * XXX this is ifdefed out because it can happen,
			 * unfortunately, because ipending and spending are
			 * not per-CPU.  I think it is not incorrect for
			 * them to be global except for IPIs.  This allows
			 * unpending to be done on any CPU.  There may be
			 * advantages to limiting unpending to 1 CPU at a
			 * time, but per-CPU pending bits give the opposite
			 * of that.
			 */
			warned = 1;
			Debugger("non-null unpend() with interrupts disabled");
		}
#endif
		return;
	}
	disable_intr();
	if (++nestcount > 6) {
		/*
		 * This can happen, unfortunately, because threads doing the
		 * unpending can be preempted.  It also gives actual nested
		 * calls here in the same thread but is supposed not to.
		 */
		printf("unpend nestcount = %d\n", nestcount);
		Debugger("");
	}
unpend_hwis:
	if (ipending == 0)
		goto unpend_swis;
	while ((bit = ffs(ipending)) != 0) {
		iunpend_count++;
		irq = bit - 1;
		/* XXX need test-and-clear here, or per-cpu ipending. */
		atomic_clear_int(&ipending, 1 << irq);
		sched_ithd((void *)irq);
	}
unpend_swis:
	if (spending != 0) {
		swi_unpend();
		goto unpend_hwis;
	}
	td = curthread;
	if (td->td_flags & TDF_NEEDRESCHED && !cold) {
		iunpend_switch_count[1]++;
		td->td_critnest = 1;
		enable_intr();
		mtx_lock_spin(&sched_lock);
		td->td_critnest = 1;
		td->td_proc->p_stats->p_ru.ru_nivcsw++;
		mi_switch();
		td->td_critnest = 2;
		mtx_unlock_spin(&sched_lock);
		disable_intr();
		td->td_critnest = 0;
		goto unpend_hwis;
	}
	--nestcount;
	enable_intr();
}
%%%

I just noticed that my version handles TDF_NEEDRESCHED in almost the
same way (I wish I had noticed this before forgetting to put the !cold
check in the -current version and had to debug it again :-).  It is
more complicated because it wants to work with interrupts disabled,
i.e., in the same environment as i386_unpend().  It juggles hard
and soft interrupt disablement so that mi_switch() is always called
with CPU interrupts enabled and td_critnest == 1.  sched_ithd() does
similar things for its call to ithread_schedule().

		/*
		 * Swap from hard to soft interrupt disablement.  Need to
		 * acquire soft disable before releasing hard.  Could call
		 * critical_enter() instead of setting td_critnest; setting
		 * it directly is just an optimization.
		 */
		td->td_critnest = 1;
		enable_intr();

		/*
		 * Also need sched_lock for calling mi_switch().  This sets
		 * td_critnest to 2, but mi_switch() wants it to be 1.
		 * Adjust it back down.  Could use critical_exit() for this,
		 * but optimize it by doing it directly.  ithread_schedule()
		 * does similar things -- it starts with td_critnest == 1
		 * and increments it via acquiring sched_lock, then has to
		 * adjust it.
		 */
		mtx_lock_spin(&sched_lock);
		td->td_critnest = 1;

		/*
		 * Usual switch code.
		 */
		td->td_proc->p_stats->p_ru.ru_nivcsw++;
		mi_switch();

		/*
		 * Reverse above steps.  First set td_critnest to 2 so that
		 * it doesn't go to 0 and cause potentially unbounded
		 * recursion or let interrupts in when mtx_unlock_spin()
		 * drops it.
		 */
		td->td_critnest = 2;
		mtx_unlock_spin(&sched_lock);

		/*
		 * Swap back to hard interrupt disablement.  The direct
		 * setting of td_critnest here is _not_ just an optimization.
		 * See above.
		 */
		disable_intr();
		td->td_critnest = 0;

Bruce