Re: sio: lots of silo overflows on Asus K8V with Moxa Smartio C104H/PCI solved

From: Bruce Evans <bde_at_zeta.org.au> Date: Wed, 5 May 2004 00:04:55 +1000 (EST) · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:53 UTC

On Mon, 3 May 2004, Burkard Meyendriesch wrote:

> On Mon, 3 May 2004 15:00:15 +1000 (EST) Bruce Evans wrote:
> > ...
> >
> > You need to turn PUC_FASTINTR back off to test the patch.
> >
> I disabled "options PUC_FASTINTR" in my kernel configuration file
> and made a new kernel. The actual boot message is:
>
> May  3 09:18:59 Reineke kernel: puc0: <Moxa Technologies, Smartio C104H/PCI> port 0xa000-0xa00f,0xa400-0xa43f,0xa800-0xa87f irq 19 at device 14.0 on pci0
> May  3 09:18:59 Reineke kernel: puc0: Reserved 0x40 bytes for rid 0x18 type 4 at 0xa400
> May  3 09:18:59 Reineke kernel: sio4: <Moxa Technologies, Smartio C104H/PCI> on puc0
> May  3 09:18:59 Reineke kernel: sio4: type 16550A
> May  3 09:18:59 Reineke kernel: sio4: unable to activate interrupt in fast mode - using normal mode
>
> Doing my sio5 Palm backup test now I again get lots of silo overflows;
> the input rate is in a range from 100 to 10000 chars/s and the puc
> interrupt rate behaves similar: from 12 to 1200 ints/s. After several
> minutes the Palm backup software gives up due to protocol errors
> (maybe as a result of massive character losses).
>
> If your patch only becomes effective without "options PUC_FASTINTR" it
> does not seem to solve the sio interrupt problem in my environment.

So much for my theory that the problem is contention with a low priority
thread.  Since holding a spin lock or otherwise disabling interrupts for
too long would also break the PUC_FASTINTR case, the problem must be that
the highest priority runnable thread (which with my patch can only be the
sio (puc) ithread if that thread is runnable) is not always run.  This is
quite likely to be just the old bug that handling of interrupts which
can't be handled immediately might be delayed for too long.  From
ithread_schedule():

% 	mtx_lock_spin(&sched_lock);
% 	if (TD_AWAITING_INTR(td)) {
% 		CTR2(KTR_INTR, "%s: setrunqueue %d", __func__, p->p_pid);
% 		TD_CLR_IWAIT(td);
% 		setrunqueue(td);
% 		if (do_switch &&
% 		    (ctd->td_critnest == 1) ) {
% 			KASSERT((TD_IS_RUNNING(ctd)),
% 			    ("ithread_schedule: Bad state for curthread."));
% 			if (ctd->td_flags & TDF_IDLETD)
% 				ctd->td_state = TDS_CAN_RUN; /* XXXKSE */
% 			mi_switch(SW_INVOL);
% 		} else {
% 			curthread->td_flags |= TDF_NEEDRESCHED;
% 		}
% 	} else {

When the switch can't be done immediately (which is now always (!!)),
this just sets TDF_NEEDRESCHED, but TDF_NEEDRESCHED is not checked
until return to user mode, so the switch is indefinitely delayed.
critical_enter() now disables interrupts, so the thread that got
interrupted must not be in a critical section.  When we return to it
without checking TDF_NEEDRESCHED, it continues until it either gives
up control or perhaps until it is interrupted by an interrupt whose
handler is missing the bugs here (a clock interrupt perhaps).

(!!) For hardware interrupts, ithread_schedule() is now always called
from a critical section.  Interrupts are now disabled in critical
sections, so ctd->td-critnest is always 0 when
ithread_execute_handlers() is called, always 1 when ithread_schedule()
is called, and always 2 when it is checked above.  So the switch can
only occur when ithread_schedule() is called in contexts that are
missing this foot-shooting.  There may be a few such calls, but
software interrupts are often scheduled from critical regions too.

I've fixed the primary bug in 2 different versions of interrupt handling
(mine and last October's version in -current).  Rescheduling must
be considered whenever td_critnest is decremented to 0.  The following
patch worked last October, but its infrastructure went away last November:

%%%
--- critical.c~	Sat Aug 16 20:39:26 2003
+++ critical.c	Fri Oct 31 01:09:13 2003
_at__at_ -37,4 +37,5 _at__at_
 #include <sys/pcpu.h>
 #include <sys/proc.h>
+#include <sys/resourcevar.h>
 #include <sys/sysctl.h>
 #include <sys/ucontext.h>
_at__at_ -56,4 +57,6 _at__at_
 #endif

+#include <ddb/ddb.h>
+
 void i386_unpend(void);		/* NOTE: not static, called from assembly */

_at__at_ -67,8 +70,40 _at__at_
 	register_t eflags;
 	struct thread *td;
+#ifdef DDB
+	static int traced;
+
+	if (db_active && !traced) {
+		traced = 1;
+		printf("cpu_unpend: naughty db\n");
+		backtrace();
+	}
+#endif
+
+#define ALLOW_PSL_I_VIOLATION
+
+#ifndef ALLOW_PSL_I_VIOLATION
+	/*
+	 * CPU interrupts can be masked here (e.g., if we are called (nested)
+	 * from code that does "intr_disable(); critical_enter(); ...
+	 * critical_exit();"), and interrupts can even be pending then (e.g.,
+	 * if we are called (nested) from Xintr* as part of ICU locking in
+	 * the SMP case after Xintr* has interrupted critical_enter() after
+	 * the latter decremented td->td_intr_nesting_level to 0 but before it
+	 * called us).  If so, don't unpend interrupts now.
+	 */
+	if ((read_eflags() & PSL_I) == 0)
+		return;
+#endif

 	td = curthread;
+#ifdef RESCHED_HERE
+more:
+#endif
 	eflags = intr_disable();
 	if (PCPU_GET(int_pending)) {
+#ifndef ALLOW_PSL_I_VIOLATION
+		if (td->td_intr_nesting_level != 0)
+			Debugger("cpu_unpend: intr_nesting_level != 0");
+#endif
 		++td->td_intr_nesting_level;
 		i386_unpend();
_at__at_ -76,4 +111,53 _at__at_
 	}
 	intr_restore(eflags);
+
+#ifdef RESCHED_HERE
+	/*
+	 * We kept td_critnest > 0 in i386_unpend() to prevent context
+	 * switches away from us.  This results in ithread_schedule()
+	 * being called in an abnormal environment so it doesn't switch
+	 * context to hardware interrupt handlers in the normal way.  It
+	 * sets TDF_NEEDRESCHED, but that is only intended for use in
+	 * ast().  Use it here too to ensure running the highest priority
+	 * thread.
+	 *
+	 * Prenting context switches away from us is probably not needed
+	 * except possibly as an optimization.  We have to do it if we
+	 * use the int_pending lock since it prevents other threads
+	 * doing any unpending, but letting the other threads do the
+	 * unpending should be safe if we are careful enough.  The calls
+	 * to i386_unpend() from Xfastintr* need special care since they
+	 * also have interrupts masked in the CPU.  Our check of
+	 * TDF_NEEDRESCHED doesn't help for these calls.
+	 *
+	 * td_critnest > 0 also prevents swi_sched() actually switching
+	 * context.  It is normal and probably even correct for swi_sched()
+	 * to not switch immediately, but this shows that swi_sched()
+	 * is broken as designed.  It should just schedule the scheduling
+	 * of a SWI; actual scheduling should not occur until a switch is
+	 * possible.  swi_sched() inherits the scheduling bug from
+	 * ithread_schedule() -- in cases where a switch is not possible,
+	 * it sets TDF_NEEDRESCHED but only ast() and now here checks this
+	 * flag.
+	 *
+	 * XXX don't switch when cold -- can't get back then.  Interrupts
+	 * can happen when cold, so there is another problem getting them
+	 * handled as soon as we become warm.
+	 */
+	if (td->td_flags & TDF_NEEDRESCHED && !cold) {
+		mtx_lock_spin(&sched_lock);
+		if (td->td_flags & TDF_IDLETD)
+			td->td_state = TDS_CAN_RUN; /* XXXKSE */
+		td->td_proc->p_stats->p_ru.ru_nivcsw++;
+		mi_switch();
+		/*
+		 * XXX can't do mtx_unlock_spin() here due to recursion
+		 * problems.
+		 */
+		_release_lock_quick(&sched_lock);
+		td->td_critnest = 0;
+		goto more;
+	}
+#endif /* RESCHED_HERE */
 }

%%%

Don't try porting this to -current unless you understand all the above :-).

This bug is also popular in signal handlers.  E.g., top(1)'s signal
handler was changed from theoretically broken but practically working
to theoretically working but practically broken by changing it to just
set a flag in a signal handler.  Checking flags set by signal handlers
soon enough after return from the handlers is not easy and is not done
in top.

I've fixed variants of the problem of being nested 2 deep in a critical
region in my version of interrupt handling only.  In my version, it
expected that ithread_schedule() is called with td_critnest == 1.
ithread_schedule() bumps td_critnest to 2 as in -current, and all
context switches are required to be done with td_critnest == 2.  This
requires minor adjustments elsewhere.  If TDF_NEEDRESCHED is set on
return from ithread_schedule(), then it is handled in the caller by
calling mi_switch() directly as above.  It can also get set while
the interrupt handler is running (but perhaps only if it doesn't get
handled right on leaving critical sections).  It can get set again
while the switched-to process is running, so there must be a loop
to keep handling it until it is clear.  The loop probably belongs in
MD code because the there may be other interrupts to consider.

Bruce