ATAng interrupt storm and bogus interrupt stormproofing

From: Bruce Evans <bde_at_zeta.org.au>
Date: Tue, 9 Sep 2003 22:56:32 +1000 (EST)
ATAng generates an interrupt storm deterministically on 1 system here.
This should lock up the system, but bugs in interrupt handler scheduling
prevent the lockup except in my version of -current where the bugs are
different.

ATAng interrupt storm:
The ATA_ATAPI_IDENTIFY command generates a second "spurious" interrupt.
This interrupt is not properly handled.  It is completely ignored, at
least on the system with the problem.  This system has only 1 atapi
slave on the channel that generates the interrupt, and no interrupt
sharing for this interrupt.  Ignoring the interrupt gives an interrupt
storm since the interrupt is level sensitive.  The interrupt really is
generated by the atapi slave, since reading the non-alternate status
register of that device clears it.  I use this read as a quick fix to
limit the storm to 10 interrupts.  This may be the correct non-quick fix
too.

Bugs in interrupt handler scheduling:
Ignoring level-sensitive interrupts should result in their thread being
switched to endlessly (except for switches to higher priority threads
and possibly for switches to equal priority threads).  But there is
some bug in interrupt handler scheduling that results in low priority
threads running while the high priority interrupt storm thread is
runnable.  This allows the thread that issued the ATA_ATAPI_IDENTIFY
command to run eventually, and it or another thread clears the storm
as a side effect of issuing another command.

I suspect that the scheduling bug is related to the old one of just
setting the TDF_NEEDRESCHED flag when a switch can't be done
immediately.  TDF_NEEDRESCHED has no effect on kernel threads since
it is only examined on return to user mode.

% int
% ithread_schedule(struct ithd *ithread, int do_switch)
% {
% 	...
% 	ithread->it_need = 1;
% 	mtx_lock_spin(&sched_lock);
% 	if (TD_AWAITING_INTR(td)) {
% 		CTR2(KTR_INTR, "%s: setrunqueue %d", __func__, p->p_pid);
% 		TD_CLR_IWAIT(td);
% 		setrunqueue(td);
% 		if (do_switch &&
% 		    (ctd->td_critnest == 1) ) {
  		    ^^^^^^^^^^^^^^^^^^^^^^^

Sometimes the interrupt will occur when ctd (curthread) is in a critical
region.  Then ctd->td_critnest >= 2 here and we don't switch.

% 			KASSERT((TD_IS_RUNNING(ctd)),
% 			    ("ithread_schedule: Bad state for curthread."));
% 			ctd->td_proc->p_stats->p_ru.ru_nivcsw++;
% 			if (ctd->td_flags & TDF_IDLETD)
% 				ctd->td_state = TDS_CAN_RUN; /* XXXKSE */
% 			mi_switch();
% 		} else {
% 			curthread->td_flags |= TDF_NEEDRESCHED;
  			^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

When we don't switch, we set this, but the setting has no effect on kernel
threads, so we will only switch to them as a side effect of handling another
interrupt.  This explains why we may keep running a lower priority thread
but now why we switch to a lower priority thread on switching back from
handling other interrupts.

Annotated debugging session:

% total=0x437cc8 entry point=0x129750
% WARNING: loader(8) metadata is missing!
% [ preserving 815752 bytes of kernel symbol table ]
% Debugger("Boot flags requested debugger")
% Stopped at      Debugger+0x45:  xchgl   %ebx,in_Debugger.0
% db> b Xintr11
% db> c

Set a breakpoit after booting with -dh.  I know from previous debugging
sessions that there will be no problems until soon after Xintr11 is hit.

% Copyright (c) 1992-2003 The FreeBSD Project.
% Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
% 	The Regents of the University of California. All rights reserved.
% FreeBSD 5.1-CURRENT #2863: Tue Sep  9 20:32:22 EST 2003
%     bde_at_besplex.bde.org:/c/sysc/i386/compile/BESPLEX
% Timecounter "i8254" frequency 1193182 Hz quality 0
% CPU: Pentium II/Pentium II Xeon/Celeron (400.91-MHz 686-class CPU)
%   Origin = "GenuineIntel"  Id = 0x665  Stepping = 5
%   Features=0x183fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR>
% real memory  = 268435456 (256 MB)
% avail memory = 256163840 (244 MB)
% Pentium Pro MTRR support enabled
% npx0: [FAST]
% npx0: <math processor> on motherboard
% npx0: flags 0x80 npx0: INT 16 interface
% pcibios: BIOS version 2.10
% Using $PIR table, 8 entries at 0xc00fdef0
% pcib0: <Intel 82443BX (440 BX) host to PCI bridge> at pcibus 0 on motherboard
% pci0: <PCI bus> on pcib0
% pci_cfgintr: 0:11 INTA BIOS irq 11
% pci_cfgintr: 0:11 INTA BIOS irq 11
% pci_cfgintr: 0:13 INTA BIOS irq 9
% pci_cfgintr: 0:17 INTA BIOS irq 5
% pci_cfgintr: 0:19 INTA BIOS irq 11
% pci_cfgintr: 0:19 INTB BIOS irq 11
% pcib1: <PCI-PCI bridge> at device 1.0 on pci0
% pci1: <PCI bus> on pcib1
% pci_cfgintr: 0:1 INTA routed to irq 5
% pcib1: slot 0 INTA is routed to irq 5
% pci1: <display, VGA> at device 0.0 (no driver attached)
% isab0: <PCI-ISA bridge> at device 7.0 on pci0
% isa0: <ISA bus> on isab0
% atapci0: <Intel PIIX4 UDMA33 controller> port 0xf000-0xf00f at device 7.1 on pci0
% ata0: at 0x1f0 irq 14 on atapci0
% ata0: [MPSAFE]
% ata1: at 0x170 irq 15 on atapci0
% ata1: [MPSAFE]
% pci0: <serial bus, USB> at device 7.2 (no driver attached)
% pci0: <bridge, PCI-unknown> at device 7.3 (no driver attached)
% pci0: <multimedia, video> at device 11.0 (no driver attached)
% pci0: <multimedia> at device 11.1 (no driver attached)
% fxp0: <Intel 82559 Pro/100 Ethernet> port 0xa400-0xa43f mem 0xea000000-0xea0fffff,0xea104000-0xea104fff irq 9 at device 13.0 on pci0
% fxp0: Ethernet address 00:90:27:99:02:99
% miibus0: <MII bus> on fxp0
% inphy0: <i82555 10/100 media interface> on miibus0
% inphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
% puc0: <Titan VScom PCI-200HV2> port 0xb000-0xb01f,0xac00-0xac07,0xa800-0xa807 mem 0xea103000-0xea103fff,0xea102000-0xea102fff irq 5 at device 17.0 on pci0
% stray irq 7
% sio4: <Titan VScom PCI-200HV2> on puc0
% sio4: type 16550A
% stray irq 7
% sio5: <Titan VScom PCI-200HV2> on puc0
% sio5: type 16550A
% atapci1: <HighPoint HPT366 UDMA66 controller> port 0xbc00-0xbcff,0xb800-0xb803,0xb400-0xb407 irq 11 at device 19.0 on pci0
% atapci1: [MPSAFE]
% ata2: at 0xb400 on atapci1
% ata2: [MPSAFE]
% atapci2: <HighPoint HPT366 UDMA66 controller> port 0xc800-0xc8ff,0xc400-0xc403,0xc000-0xc007 irq 11 at device 19.1 on pci0
% atapci2: [MPSAFE]
% ata3: at 0xc000 on atapci2
% ata3: [MPSAFE]
% orm0: <Option ROMs> at iomem 0xc8000-0xcbfff,0xc0000-0xc7fff on isa0
% fdc0: <Enhanced floppy controller (i82077, NE72065 or clone)> at port 0x3f7,0x3f0-0x3f5 irq 6 drq 2 on isa0
% fdc0: FIFO enabled, 8 bytes threshold
% fd0: <1440-KB 3.5" drive> on fdc0 drive 0
% atkbdc0: <Keyboard controller (i8042)> at port 0x64,0x60 on isa0
% atkbd0: <AT Keyboard> flags 0x1 irq 1 on atkbdc0
% kbd0 at atkbd0
% psm0: <PS/2 Mouse> irq 12 on atkbdc0
% psm0: model Generic PS/2 mouse, device ID 0
% vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
% sc0: <System console> at flags 0x100 on isa0
% sc0: VGA <16 virtual consoles, flags=0x100>
% sio0 at port 0x3f8-0x3ff irq 4 flags 0x90 on isa0
% sio0: type 16550A, console
% sio1 at port 0x2f8-0x2ff irq 3 on isa0
% sio1: type 16550A
% cy0 at iomem 0xd4000-0xd5fff irq 10 on isa0
% cy0: driver is using old-style compatibility shims
% ppc0: <Parallel port> at port 0x378-0x37f irq 7 on isa0
% ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode
% ppc0: FIFO with 16/16/16 bytes threshold
% ppbus0: <Parallel port bus> on ppc0
% plip0: <PLIP network interface> on ppbus0
% lpt0: <Printer> on ppbus0
% lpt0: Interrupt-driven port
% ppi0: <Parallel I/O> on ppbus0
% unknown: <PNP0303> can't assign resources (port)
% speaker0: <PC speaker> at port 0x61 on isa0
% unknown: <PNP0a03> can't assign resources (port)
% unknown: <PNP0f13> can't assign resources (irq)
% unknown: <PNP0501> can't assign resources (port)
% unknown: <PNP0700> can't assign resources (port)
% unknown: <PNP0401> can't assign resources (port)
% unknown: <PNP0501> can't assign resources (port)
% Timecounter "TSC" frequency 400909093 Hz quality 800
% Timecounters tick every 10.000 msec
% ipfw2 initialized, divert enabled, rule-based forwarding enabled, default to accept, logging disabled
% BRIDGE 020214 loaded
% GEOM: create disk ad0 dp=0xc26cba70
% ad0: 6149MB <QUANTUM FIREBALL ST6.4A> [13328/15/63] at ata0-master UDMA33
% GEOM: create disk afd0 dp=0xc26cb97c
% afd0: REMOVABLE <IOMEGA ZIP 100 ATAPI> at ata0-slave PIO0
% acd0: CDRW <RICOH CD-RW MP7320A> at ata1-slave UDMA33
% Breakpoint at   Xintr11:        pushl   $0

Now display some state.  Nothing much happening yet.  Forgot to do
"t 0" to show the source of the interrupt.

% db> t
% Xintr11(8,80206,c0eb7d10,c0eba680,c0eb7d10) at Xintr11
% runq_add(c0386000,c0eb8b00,cd73fca0,c01d4e0d,c0eb8b00) at runq_add+0x4e
% sched_add(c0eb8b00,c036bda0,c0eba680,c0eb7d10,c0eb7d10) at sched_add+0x1e
% setrunqueue(c0eb7d10,c0eb7d10,4,cd73fcc0,c01d64b4) at setrunqueue+0x2d
% sched_wakeup(c0eb7d10) at sched_wakeup+0x29
% setrunnable(c0eb7d10) at setrunnable+0x80
% endtsleep(c0eb7d10) at endtsleep+0xd6
% softclock(0) at softclock+0x21f
% ithread_loop(c0eb4280,cd73fd48) at ithread_loop+0x169
% fork_exit(c01bdab0,c0eb4280,cd73fd48) at fork_exit+0xad
% fork_trampoline() at fork_trampoline+0x8
% --- trap 0x1, eip = 0, esp = 0xcd73fd7c, ebp = 0 ---
% db> ps
%   pid   proc     uarea   uid  ppid  pgrp  flag   stat  wmesg    wchan  cmd
%    36 c26bf000 ce1d3000    0     0     0 0000204 new [IWAIT] irq8: rtc
%    35 c26bf1e4 ce1d4000    0     0     0 0000204 new [IWAIT] irq0: clk
%    34 c253a5ac ce18d000    0     0     0 0000204 [IWAIT] irq7: ppc0
%    33 c253a790 ce18e000    0     0     0 0000204 new [IWAIT] irq10: cy0
%    32 c253a974 ce18f000    0     0     0 0000204 new [IWAIT] irq3: sio1
%    31 c253ab58 ce190000    0     0     0 0000204 new [IWAIT] irq4: sio0
%    30 c253ad3c ce191000    0     0     0 0000204 new [IWAIT] irq12: psm0
%    29 c25b8000 ce19c000    0     0     0 0000204 new [IWAIT] irq1: atkbd0
%    28 c25b81e4 ce19d000    0     0     0 0000204 [IWAIT] irq6: fdc0
%    27 c25b83c8 ce19e000    0     0     0 0000204 new [IWAIT] irq11: atapci1+
%    26 c25b85ac ce19f000    0     0     0 0000204 [IWAIT] swi0: tty:sio+
%    25 c25b8790 ce1a0000    0     0     0 0000204 new [IWAIT] irq5: puc0
%    24 c25b8974 ce1a1000    0     0     0 0000204 new [IWAIT] irq9: fxp0
%    23 c0ebc1e4 cd77b000    0     0     0 0000204 [IWAIT] irq15: ata1
%    22 c0ebc3c8 cd77c000    0     0     0 0000204 [IWAIT] irq14: ata0
%    21 c0ebc5ac cd77d000    0     0     0 0000204 new [IWAIT] irq13:
%    20 c0ebc790 cd77e000    0     0     0 0000204 [RUNQ] swi7: task queue
%    19 c0ebc974 cd77f000    0     0     0 0000204 new [IWAIT] swi3: cambio
%    18 c0ebcb58 cd780000    0     0     0 0000204 new [IWAIT] swi2: camnet
% --More--
   17 c0ebcd3c cd7a8000    0     0     0 0000204 new [IWAIT] swi5:+
%     5 c253a000 ce18a000    0     1     0 0000204 [SLP]tqthr 0xc0386848] taskqueue
%    16 c253a1e4 ce18b000    0     0     0 0000204 new [IWAIT] swi6:+
%    15 c253a3c8 ce18c000    0     0     0 0000204 [RUNQ] random
%     4 c0eb5000 cd724000    0     0     0 0000204 [SLP]- 0xc037e9e0] g_down
%     3 c0eb51e4 cd773000    0     0     0 0000204 [SLP]- 0xc037e9dc] g_up
%     2 c0eb53c8 cd774000    0     0     0 0000204 [SLP]gread 0xc26cc6c0] g_event
%    14 c0eb55ac cd775000    0     0     0 0000204 new [IWAIT] swi1: net
%    13 c0eb5790 cd776000    0     0     0 0000204 new [IWAIT] swi4: vm
%    12 c0eb5974 cd777000    0     0     0 000020c [CPU 0] swi8: tty:sio clock
%    11 c0eb5b58 cd778000    0     0     0 000020c [Can run] idle
%     1 c0eb5d3c cd779000    0     0     0 0000200 new [INACTIVE] swapper
%    10 c0ebc000 cd77a000    0     0     0 0000204 [CV]ktrace 0xc03820b4] ktrace
%     0 c037eba0 c0457000    0     0     0 0000200 [SLP]atareq 0xc0eb3d00] swapper
% db> b ata_interrupt
% db> c

Set another breakpoit.  I know from previous debugging sessions that it
will be hit soon with nothing much happening in between.

% Breakpoint at   ata_interrupt:  pushl   %ebp
% db> t
% ata_interrupt(c259a600) at ata_interrupt
% ata_highpoint_intr(c25b7780) at ata_highpoint_intr+0x163
% ithread_loop(c25b7700,cd757d48) at ithread_loop+0x169
% fork_exit(c01bdab0,c25b7700,cd757d48) at fork_exit+0xad
% fork_trampoline() at fork_trampoline+0x8
% --- trap 0x1, eip = 0, esp = 0xcd757d7c, ebp = 0 ---
% db> ps
%   pid   proc     uarea   uid  ppid  pgrp  flag   stat  wmesg    wchan  cmd
%    36 c26bf000 ce1d3000    0     0     0 0000204 new [IWAIT] irq8: rtc
%    35 c26bf1e4 ce1d4000    0     0     0 0000204 new [IWAIT] irq0: clk
%    34 c253a5ac ce18d000    0     0     0 0000204 [IWAIT] irq7: ppc0
%    33 c253a790 ce18e000    0     0     0 0000204 new [IWAIT] irq10: cy0
%    32 c253a974 ce18f000    0     0     0 0000204 new [IWAIT] irq3: sio1
%    31 c253ab58 ce190000    0     0     0 0000204 new [IWAIT] irq4: sio0
%    30 c253ad3c ce191000    0     0     0 0000204 new [IWAIT] irq12: psm0
%    29 c25b8000 ce19c000    0     0     0 0000204 new [IWAIT] irq1: atkbd0
%    28 c25b81e4 ce19d000    0     0     0 0000204 [IWAIT] irq6: fdc0
%    27 c25b83c8 ce19e000    0     0     0 0000204 [CPU 0] irq11: atapci1+
%    26 c25b85ac ce19f000    0     0     0 0000204 [IWAIT] swi0: tty:sio+
%    25 c25b8790 ce1a0000    0     0     0 0000204 new [IWAIT] irq5: puc0
%    24 c25b8974 ce1a1000    0     0     0 0000204 new [IWAIT] irq9: fxp0
%    23 c0ebc1e4 cd77b000    0     0     0 0000204 [IWAIT] irq15: ata1
%    22 c0ebc3c8 cd77c000    0     0     0 0000204 [IWAIT] irq14: ata0
%    21 c0ebc5ac cd77d000    0     0     0 0000204 new [IWAIT] irq13:
%    20 c0ebc790 cd77e000    0     0     0 0000204 [RUNQ] swi7: task queue
%    19 c0ebc974 cd77f000    0     0     0 0000204 new [IWAIT] swi3: cambio
%    18 c0ebcb58 cd780000    0     0     0 0000204 new [IWAIT] swi2: camnet
% --More--
   17 c0ebcd3c cd7a8000    0     0     0 0000204 new [IWAIT] swi5:+
%     5 c253a000 ce18a000    0     1     0 0000204 [SLP]tqthr 0xc0386848] taskqueue
%    16 c253a1e4 ce18b000    0     0     0 0000204 new [IWAIT] swi6:+
%    15 c253a3c8 ce18c000    0     0     0 0000204 [RUNQ] random
%     4 c0eb5000 cd724000    0     0     0 0000204 [SLP]- 0xc037e9e0] g_down
%     3 c0eb51e4 cd773000    0     0     0 0000204 [SLP]- 0xc037e9dc] g_up
%     2 c0eb53c8 cd774000    0     0     0 0000204 [SLP]gread 0xc26cc6c0] g_event
%    14 c0eb55ac cd775000    0     0     0 0000204 new [IWAIT] swi1: net
%    13 c0eb5790 cd776000    0     0     0 0000204 new [IWAIT] swi4: vm
%    12 c0eb5974 cd777000    0     0     0 000020c [IWAIT] swi8: tty:sio clock
%    11 c0eb5b58 cd778000    0     0     0 000020c [Can run] idle
%     1 c0eb5d3c cd779000    0     0     0 0000200 new [INACTIVE] swapper
%    10 c0ebc000 cd77a000    0     0     0 0000204 [CV]ktrace 0xc03820b4] ktrace
%     0 c037eba0 c0457000    0     0     0 0000200 [SLP]atareq 0xc0eb3d00] swapper
% db> t 0
% mi_switch(1,0,0,0,c0eb3d00) at mi_switch+0x189
% msleep(c0eb3d00,0,4c,c0321ffe,ea60) at msleep+0x475
% ata_queue_request(c0eb3d00,a115349d,c259a600,2,45e000) at ata_queue_request+0x151
% ata_getparam(c259a6d4,a1) at ata_getparam+0x82
% ata_identify_devices(c259a600) at ata_identify_devices+0x5a
% ata_boot_attach(0) at ata_boot_attach+0x27
% run_interrupt_driven_config_hooks(0,456c00,456000,0,c01297e5) at run_interrupt_driven_config_hooks+0x18
% mi_startup() at mi_startup+0x96
% begin() at begin+0x2c

Remembered to show the source of the interrupt this time.  Begin detailed
trace since now since this is the easiest way to reach the critical insl.

% db> s
% Stopped at      ata_interrupt+0x1:      movl    %esp,%ebp
% db>
% Stopped at      ata_interrupt+0x3:      pushl   %edi
% db>
% Stopped at      ata_interrupt+0x4:      pushl   %esi
% db>
% Stopped at      ata_interrupt+0x5:      pushl   %ebx
% db>
% Stopped at      ata_interrupt+0x6:      subl    $0x18,%esp
% db>
% Stopped at      ata_interrupt+0x9:      movl    0x8(%ebp),%esi
% db>
% Stopped at      ata_interrupt+0xc:      movl    0x14c(%esi),%edi
% db>
% Stopped at      ata_interrupt+0x12:     cmpl    $0,0x108(%esi)
% db>
% Stopped at      ata_interrupt+0x19:     jz      ata_interrupt+0x939
% db>
% Stopped at      ata_interrupt+0x1f:     cmpl    $0,0x48(%esi)
% db>
% Stopped at      ata_interrupt+0x23:     jz      ata_interrupt+0x44
% db>
% Stopped at      ata_interrupt+0x25:     pushl   0x48(%esi)
% db>
% Stopped at      ata_interrupt+0x28:     call    rman_get_bustag
% db>
% Stopped at      rman_get_bustag:        pushl   %ebp
% db>
% Stopped at      rman_get_bustag+0x1:    movl    %esp,%ebp
% db>
% Stopped at      rman_get_bustag+0x3:    movl    0x8(%ebp),%eax
% db>
% Stopped at      rman_get_bustag+0x6:    movl    0x24(%eax),%eax
% db>
% Stopped at      rman_get_bustag+0x9:    leave
% db>
% Stopped at      rman_get_bustag+0xa:    ret
% db>
% Stopped at      ata_interrupt+0x2d:     addl    $0x4,%esp
% db>
% Stopped at      ata_interrupt+0x30:     movl    %eax,%ebx
% db>
% Stopped at      ata_interrupt+0x32:     pushl   0x48(%esi)
% db>
% Stopped at      ata_interrupt+0x35:     call    rman_get_bushandle
% db>
% Stopped at      rman_get_bushandle:     pushl   %ebp
% db>
% Stopped at      rman_get_bushandle+0x1: movl    %esp,%ebp
% db>
% Stopped at      rman_get_bushandle+0x3: movl    0x8(%ebp),%eax
% db>
% Stopped at      rman_get_bushandle+0x6: movl    0x28(%eax),%eax
% db>
% Stopped at      rman_get_bushandle+0x9: leave
% db>
% Stopped at      rman_get_bushandle+0xa: ret
% db>
% Stopped at      ata_interrupt+0x3a:     addl    $0x4,%esp
% db>
% Stopped at      ata_interrupt+0x3d:     movl    %eax,%edx
% db>
% Stopped at      ata_interrupt+0x3f:     movl    0x4c(%esi),%eax
% db>
% Stopped at      ata_interrupt+0x42:     jmp     ata_interrupt+0x99
% db>
% Stopped at      ata_interrupt+0x99:     testl   %ebx,%ebx
% db>
% Stopped at      ata_interrupt+0x9b:     jnz     ata_interrupt+0xa4
% db>
% Stopped at      ata_interrupt+0x9d:     addl    %eax,%edx
% db>
% Stopped at      ata_interrupt+0x9f:     inb     %dx,%al
% db>
% Stopped at      ata_interrupt+0xa0:     jmp     ata_interrupt+0xa7
% db>
% Stopped at      ata_interrupt+0xa7:     movzbl  %al,%eax
% db>
% Stopped at      ata_interrupt+0xaa:     testb   %al,%al
% db>
% Stopped at      ata_interrupt+0xac:     jns     ata_interrupt+0x14c
% db>
% Stopped at      ata_interrupt+0x14c:    cmpl    $0,0x40(%esi)
% db>
% Stopped at      ata_interrupt+0x150:    jz      ata_interrupt+0x174
% db>
% Stopped at      ata_interrupt+0x152:    pushl   0x40(%esi)
% db>
% Stopped at      ata_interrupt+0x155:    call    rman_get_bustag
% db>
% Stopped at      rman_get_bustag:        pushl   %ebp
% db>
% Stopped at      rman_get_bustag+0x1:    movl    %esp,%ebp
% db>
% Stopped at      rman_get_bustag+0x3:    movl    0x8(%ebp),%eax
% db>
% Stopped at      rman_get_bustag+0x6:    movl    0x24(%eax),%eax
% db>
% Stopped at      rman_get_bustag+0x9:    leave
% db>
% Stopped at      rman_get_bustag+0xa:    ret
% db>
% Stopped at      ata_interrupt+0x15a:    addl    $0x4,%esp
% db>
% Stopped at      ata_interrupt+0x15d:    movl    %eax,%ebx
% db>
% Stopped at      ata_interrupt+0x15f:    pushl   0x40(%esi)
% db>
% Stopped at      ata_interrupt+0x162:    call    rman_get_bushandle
% db>
% Stopped at      rman_get_bushandle:     pushl   %ebp
% db>
% Stopped at      rman_get_bushandle+0x1: movl    %esp,%ebp
% db>
% Stopped at      rman_get_bushandle+0x3: movl    0x8(%ebp),%eax
% db>
% Stopped at      rman_get_bushandle+0x6: movl    0x28(%eax),%eax
% db>
% Stopped at      rman_get_bushandle+0x9: leave
% db>
% Stopped at      rman_get_bushandle+0xa: ret
% db>
% Stopped at      ata_interrupt+0x167:    addl    $0x4,%esp
% db>
% Stopped at      ata_interrupt+0x16a:    movl    %eax,%edx
% db>
% Stopped at      ata_interrupt+0x16c:    movl    0x44(%esi),%eax
% db>
% Stopped at      ata_interrupt+0x16f:    jmp     ata_interrupt+0x1c9
% db>
% Stopped at      ata_interrupt+0x1c9:    testl   %ebx,%ebx
% db>
% Stopped at      ata_interrupt+0x1cb:    jnz     ata_interrupt+0x1d4
% db>
% Stopped at      ata_interrupt+0x1cd:    addl    %eax,%edx
% db>
% Stopped at      ata_interrupt+0x1cf:    inb     %dx,%al
% db>
% Stopped at      ata_interrupt+0x1d0:    jmp     ata_interrupt+0x1d7
% db>
% Stopped at      ata_interrupt+0x1d7:    movzbl  %al,%eax
% db>
% Stopped at      ata_interrupt+0x1da:    movb    %al,0xfffffff3(%ebp)
% db>
% Stopped at      ata_interrupt+0x1dd:    testl   %edi,%edi
% db>
% Stopped at      ata_interrupt+0x1df:    jnz     ata_interrupt+0x32c
% db>
% Stopped at      ata_interrupt+0x32c:    movb    0xfffffff3(%ebp),%al
% db>
% Stopped at      ata_interrupt+0x32f:    movb    %al,0x18(%edi)
% db>
% Stopped at      ata_interrupt+0x332:    movl    0x2c(%edi),%eax
% db>
% Stopped at      ata_interrupt+0x335:    andl    $0x50,%eax
% db>
% Stopped at      ata_interrupt+0x338:    cmpl    $0x40,%eax
% db>
% Stopped at      ata_interrupt+0x33b:    jz      ata_interrupt+0x488
% db>
% Stopped at      ata_interrupt+0x341:    cmpl    $0x40,%eax
% db>
% Stopped at      ata_interrupt+0x344:    jnle    ata_interrupt+0x354
% db>
% Stopped at      ata_interrupt+0x346:    cmpl    $0x10,%eax
% db>
% Stopped at      ata_interrupt+0x349:    jz      ata_interrupt+0x4a1
% db>
% Stopped at      ata_interrupt+0x34f:    jmp     ata_interrupt+0x35d
% db>
% Stopped at      ata_interrupt+0x35d:    testb   $0x1,0x18(%edi)
% db>
% Stopped at      ata_interrupt+0x361:    jnz     ata_interrupt+0x870
% db>
% Stopped at      ata_interrupt+0x367:    testb   $0x4,0x2c(%edi)
% db>
% Stopped at      ata_interrupt+0x36b:    jz      ata_interrupt+0x37c
% db>
% Stopped at      ata_interrupt+0x36d:    pushl   0x20(%edi)
% db>
% Stopped at      ata_interrupt+0x370:    pushl   %edi
% db>
% Stopped at      ata_interrupt+0x371:    call    ata_pio_read
% db>
% Stopped at      ata_pio_read:   pushl   %ebp
% db>
% Stopped at      ata_pio_read+0x1:       movl    %esp,%ebp
% db>
% Stopped at      ata_pio_read+0x3:       pushl   %edi
% db>
% Stopped at      ata_pio_read+0x4:       pushl   %esi
% db>
% Stopped at      ata_pio_read+0x5:       pushl   %ebx
% db>
% Stopped at      ata_pio_read+0x6:       subl    $0x14,%esp
% db>
% Stopped at      ata_pio_read+0x9:       movl    0x8(%ebp),%eax
% db>
% Stopped at      ata_pio_read+0xc:       movl    0x20(%eax),%edx
% db>
% Stopped at      ata_pio_read+0xf:       movl    0xc(%ebp),%eax
% db>
% Stopped at      ata_pio_read+0x12:      cmpl    %edx,%eax
% db>
% Stopped at      ata_pio_read+0x14:      jbe     ata_pio_read+0x18
% db>
% Stopped at      ata_pio_read+0x18:      movl    %eax,0xfffffff0(%ebp)
% db>
% Stopped at      ata_pio_read+0x1b:      movl    0x8(%ebp),%edx
% db>
% Stopped at      ata_pio_read+0x1e:      movl    0(%edx),%eax
% db>
% Stopped at      ata_pio_read+0x20:      movl    0(%eax),%esi
% db>
% Stopped at      ata_pio_read+0x22:      testb   $0x2,0xa0(%esi)
% db>
% Stopped at      ata_pio_read+0x29:      jnz     ata_pio_read+0x35
% db>
% Stopped at      ata_pio_read+0x2b:      testb   $0x3,0xfffffff0(%ebp)
% db>
% Stopped at      ata_pio_read+0x2f:      jz      ata_pio_read+0x104
% db>
% Stopped at      ata_pio_read+0x104:     cmpl    $0,0x8(%esi)
% db>
% Stopped at      ata_pio_read+0x108:     jz      ata_pio_read+0x147
% db>
% Stopped at      ata_pio_read+0x10a:     pushl   0x8(%esi)
% db>
% Stopped at      ata_pio_read+0x10d:     call    rman_get_bustag
% db>
% Stopped at      rman_get_bustag:        pushl   %ebp
% db>
% Stopped at      rman_get_bustag+0x1:    movl    %esp,%ebp
% db>
% Stopped at      rman_get_bustag+0x3:    movl    0x8(%ebp),%eax
% db>
% Stopped at      rman_get_bustag+0x6:    movl    0x24(%eax),%eax
% db>
% Stopped at      rman_get_bustag+0x9:    leave
% db>
% Stopped at      rman_get_bustag+0xa:    ret
% db>
% Stopped at      ata_pio_read+0x112:     addl    $0x4,%esp
% db>
% Stopped at      ata_pio_read+0x115:     movl    %eax,%ebx
% db>
% Stopped at      ata_pio_read+0x117:     pushl   0x8(%esi)
% db>
% Stopped at      ata_pio_read+0x11a:     call    rman_get_bushandle
% db>
% Stopped at      rman_get_bushandle:     pushl   %ebp
% db>
% Stopped at      rman_get_bushandle+0x1: movl    %esp,%ebp
% db>
% Stopped at      rman_get_bushandle+0x3: movl    0x8(%ebp),%eax
% db>
% Stopped at      rman_get_bushandle+0x6: movl    0x28(%eax),%eax
% db>
% Stopped at      rman_get_bushandle+0x9: leave
% db>
% Stopped at      rman_get_bushandle+0xa: ret
% db>
% Stopped at      ata_pio_read+0x11f:     addl    $0x4,%esp
% db>
% Stopped at      ata_pio_read+0x122:     movl    %eax,0xffffffe4(%ebp)
% db>
% Stopped at      ata_pio_read+0x125:     movl    0xc(%esi),%edx
% db>
% Stopped at      ata_pio_read+0x128:     movl    0x8(%ebp),%ecx
% db>
% Stopped at      ata_pio_read+0x12b:     movl    0x28(%ecx),%eax
% db>
% Stopped at      ata_pio_read+0x12e:     movl    %eax,%edi
% db>
% Stopped at      ata_pio_read+0x130:     addl    0x24(%ecx),%edi
% db>
% Stopped at      ata_pio_read+0x133:     movl    0xfffffff0(%ebp),%ecx
% db>
% Stopped at      ata_pio_read+0x136:     shrl    $0x2,%ecx
% db>
% Stopped at      ata_pio_read+0x139:     testl   %ebx,%ebx
% db>
% Stopped at      ata_pio_read+0x13b:     jnz     ata_pio_read+0x142
% db>
% Stopped at      ata_pio_read+0x13d:     addl    0xffffffe4(%ebp),%edx
% db>
% Stopped at      ata_pio_read+0x140:     jmp     ata_pio_read+0x1b3
% db>
% Stopped at      ata_pio_read+0x1b3:     cld
% db>
% Stopped at      ata_pio_read+0x1b4:     repe insl       %dx,%es:(%edi)
% db>
% Stopped at      ata_pio_read+0x1b4:     repe insl       %dx,%es:(%edi)
% db>
% Stopped at      ata_pio_read+0x1b4:     repe insl       %dx,%es:(%edi)
% db>
% Stopped at      ata_pio_read+0x1b4:     repe insl       %dx,%es:(%edi)
% db>
% Stopped at      ata_pio_read+0x1b4:     repe insl       %dx,%es:(%edi)

We just started reading the data returned by ATA_ATAPI_IDENTIFY.  Show
the registers to get a count of how many insl's to skip.

% db> sh r
% cs                 0x8
% ds                0x10
% es                0x10
% fs                0x18
% ss                0x10
% eax         0xc25eaa00
% ecx               0x7c
% edx             0xb400
% ebx                  0
% esp         0xcd757c88
% ebp         0xcd757ca8
% esi         0xc259a600
% edi         0xc25eaa10
% eip         0xc0142fb8  ata_pio_read+0x1b4
% efl            0x90206
% dr0                  0
% dr1                  0
% dr2                  0
% dr3                  0
% dr4         0xffff4ff0
% dr5              0x400
% dr6         0xffff4ff0
% dr7              0x400
% ata_pio_read+0x1b4:     repe insl       %dx,%es:(%edi)
% db> s,0x7b
% Stopped at      ata_pio_read+0x1b4:     repe insl       %dx,%es:(%edi)

Skip all except the last insl.

% db> call inb(0xa0)
% 0x1

This verifies that there is no irq11 pending now (bit 2^(11-8) = 0x8).

% db> call inb(0xa1)
% 0x28

This verifies that irq11 is masked now (bit 2^(11-8) = 0x8).  It needs
to be masked for the previous read of the active interrupts to be valid.

% db> s
% Stopped at      ata_pio_read+0x1b6:     jmp     ata_pio_read+0x1c4

Do the last insl.

% db> call inb(0xa0)
% 0x9

Now there is an irq11 pending.  The device apparently has more to say.
(I forgot to read its status here.  IIRC it doesn't set DRQ.)

% db> call inb(0xa1)
% 0x28

This verifies that irq11 is still masked.

% db> ps
%   pid   proc     uarea   uid  ppid  pgrp  flag   stat  wmesg    wchan  cmd
%    36 c26bf000 ce1d3000    0     0     0 0000204 new [IWAIT] irq8: rtc
%    35 c26bf1e4 ce1d4000    0     0     0 0000204 new [IWAIT] irq0: clk
%    34 c253a5ac ce18d000    0     0     0 0000204 [IWAIT] irq7: ppc0
%    33 c253a790 ce18e000    0     0     0 0000204 new [IWAIT] irq10: cy0
%    32 c253a974 ce18f000    0     0     0 0000204 new [IWAIT] irq3: sio1
%    31 c253ab58 ce190000    0     0     0 0000204 new [IWAIT] irq4: sio0
%    30 c253ad3c ce191000    0     0     0 0000204 new [IWAIT] irq12: psm0
%    29 c25b8000 ce19c000    0     0     0 0000204 new [IWAIT] irq1: atkbd0
%    28 c25b81e4 ce19d000    0     0     0 0000204 [IWAIT] irq6: fdc0
%    27 c25b83c8 ce19e000    0     0     0 0000204 [CPU 0] irq11: atapci1+

Pid 27 is the irq11 handler.  It is running.

%    26 c25b85ac ce19f000    0     0     0 0000204 [IWAIT] swi0: tty:sio+
%    25 c25b8790 ce1a0000    0     0     0 0000204 new [IWAIT] irq5: puc0
%    24 c25b8974 ce1a1000    0     0     0 0000204 new [IWAIT] irq9: fxp0
%    23 c0ebc1e4 cd77b000    0     0     0 0000204 [IWAIT] irq15: ata1
%    22 c0ebc3c8 cd77c000    0     0     0 0000204 [IWAIT] irq14: ata0
%    21 c0ebc5ac cd77d000    0     0     0 0000204 new [IWAIT] irq13:
%    20 c0ebc790 cd77e000    0     0     0 0000204 [RUNQ] swi7: task queue
%    19 c0ebc974 cd77f000    0     0     0 0000204 new [IWAIT] swi3: cambio
%    18 c0ebcb58 cd780000    0     0     0 0000204 new [IWAIT] swi2: camnet
% --More--
   17 c0ebcd3c cd7a8000    0     0     0 0000204 new [IWAIT] swi5:+
%     5 c253a000 ce18a000    0     1     0 0000204 [SLP]tqthr 0xc0386848] taskqueue
%    16 c253a1e4 ce18b000    0     0     0 0000204 new [IWAIT] swi6:+
%    15 c253a3c8 ce18c000    0     0     0 0000204 [RUNQ] random
%     4 c0eb5000 cd724000    0     0     0 0000204 [SLP]- 0xc037e9e0] g_down
%     3 c0eb51e4 cd773000    0     0     0 0000204 [SLP]- 0xc037e9dc] g_up
%     2 c0eb53c8 cd774000    0     0     0 0000204 [SLP]gread 0xc26cc6c0] g_event
%    14 c0eb55ac cd775000    0     0     0 0000204 new [IWAIT] swi1: net
%    13 c0eb5790 cd776000    0     0     0 0000204 new [IWAIT] swi4: vm
%    12 c0eb5974 cd777000    0     0     0 000020c [RUNQ] swi8: tty:sio clock
%    11 c0eb5b58 cd778000    0     0     0 000020c [Can run] idle
%     1 c0eb5d3c cd779000    0     0     0 0000200 new [INACTIVE] swapper
%    10 c0ebc000 cd77a000    0     0     0 0000204 [CV]ktrace 0xc03820b4] ktrace
%     0 c037eba0 c0457000    0     0     0 0000200 [SLP]atareq 0xc0eb3d00] swapper
% db> t 0
% mi_switch(1,0,0,0,c0eb3d00) at mi_switch+0x189
% msleep(c0eb3d00,0,4c,c0321ffe,ea60) at msleep+0x475
% ata_queue_request(c0eb3d00,a115349d,c259a600,2,45e000) at ata_queue_request+0x151
% ata_getparam(c259a6d4,a1) at ata_getparam+0x82
% ata_identify_devices(c259a600) at ata_identify_devices+0x5a
% ata_boot_attach(0) at ata_boot_attach+0x27
% run_interrupt_driven_config_hooks(0,456c00,456000,0,c01297e5) at run_interrupt_driven_config_hooks+0x18
% mi_startup() at mi_startup+0x96
% begin() at begin+0x2c

Pid 0 is the thread that issued the ATA_ATAPI_IDENTIFY command.  It is still
sleeping.  It will be woken up by an ata_finish() soon (not shown).

% db> b ata_getparam+0x82
% db> c

Prepare to look at completion of the ATA_ATAPI_IDENTIFY command (from the
viewpoint of the thread that issued it).

% Breakpoint at   Xintr11:        pushl   $0
% db>
% Breakpoint at   ata_interrupt:  pushl   %ebp
% db>
% Breakpoint at   Xintr11:        pushl   $0
% db>
% Breakpoint at   ata_interrupt:  pushl   %ebp
% db>
% Breakpoint at   Xintr11:        pushl   $0

These are all for the "spurious" interrupt.  They are completely
unhandled.  Other debugging sessions showed that ata_interrupt returns
almost immediately after finding that ch->state == ATA_IDLE.
ata_interru[t's handling regressed in ata-lowlevel.c rev.1.9.  In
rev.1.8 it at least printed a message about this problem in the
bootverbose case.

% db> d
% db> c
% Breakpoint at   ata_interrupt:  pushl   %ebp
% db> d
% db> c
% Breakpoint at   ata_getparam+0x82:      movl    0x40(%ebx),%edi

Clear breakpoints in interrupt handlers so as to get to the thread that
issued the command as soon as possible, and continue to that thread.
Other debugging sessions showed no suprises except for the "spurious"
interrupt.

% db> t
% ata_getparam(c259a6d4,a1) at ata_getparam+0x82
% ata_identify_devices(c259a600) at ata_identify_devices+0x5a
% ata_boot_attach(0) at ata_boot_attach+0x27
% run_interrupt_driven_config_hooks(0,456c00,456000,0,c01297e5) at run_interrupt_driven_config_hooks+0x18
% mi_startup() at mi_startup+0x96
% begin() at begin+0x2c
% db> ps

Now running the main initialization thread.  Show the scheduling bugs that
let us get to it.

%   pid   proc     uarea   uid  ppid  pgrp  flag   stat  wmesg    wchan  cmd
%    36 c26bf000 ce1d3000    0     0     0 0000204 new [IWAIT] irq8: rtc
%    35 c26bf1e4 ce1d4000    0     0     0 0000204 new [IWAIT] irq0: clk
%    34 c253a5ac ce18d000    0     0     0 0000204 [IWAIT] irq7: ppc0
%    33 c253a790 ce18e000    0     0     0 0000204 new [IWAIT] irq10: cy0
%    32 c253a974 ce18f000    0     0     0 0000204 new [IWAIT] irq3: sio1
%    31 c253ab58 ce190000    0     0     0 0000204 new [IWAIT] irq4: sio0
%    30 c253ad3c ce191000    0     0     0 0000204 new [IWAIT] irq12: psm0
%    29 c25b8000 ce19c000    0     0     0 0000204 new [IWAIT] irq1: atkbd0
%    28 c25b81e4 ce19d000    0     0     0 0000204 [IWAIT] irq6: fdc0
%    27 c25b83c8 ce19e000    0     0     0 0000204 [RUNQ] irq11: atapci1+

The irq11 thread wants to run, but for some reason it isn't running.  Other
debugging sessions shows that all priorities are reasonable, with the
irq11 thread having a much higher priority than thread 0.

%    26 c25b85ac ce19f000    0     0     0 0000204 [IWAIT] swi0: tty:sio+
%    25 c25b8790 ce1a0000    0     0     0 0000204 new [IWAIT] irq5: puc0
%    24 c25b8974 ce1a1000    0     0     0 0000204 new [IWAIT] irq9: fxp0
%    23 c0ebc1e4 cd77b000    0     0     0 0000204 [IWAIT] irq15: ata1
%    22 c0ebc3c8 cd77c000    0     0     0 0000204 [IWAIT] irq14: ata0
%    21 c0ebc5ac cd77d000    0     0     0 0000204 new [IWAIT] irq13:
%    20 c0ebc790 cd77e000    0     0     0 0000204 [IWAIT] swi7: task queue
%    19 c0ebc974 cd77f000    0     0     0 0000204 [IWAIT] swi3: cambio
%    18 c0ebcb58 cd780000    0     0     0 0000204 new [IWAIT] swi2: camnet
% --More--
   17 c0ebcd3c cd7a8000    0     0     0 0000204 new [IWAIT] swi5:+
%     5 c253a000 ce18a000    0     1     0 0000204 [SLP]tqthr 0xc0386848] taskqueue
%    16 c253a1e4 ce18b000    0     0     0 0000204 new [IWAIT] swi6:+
%    15 c253a3c8 ce18c000    0     0     0 0000204 [RUNQ] random

This is a kernel thread, but it has a user priority so its non-runningness
is correct.

%     4 c0eb5000 cd724000    0     0     0 0000204 [SLP]- 0xc037e9e0] g_down
%     3 c0eb51e4 cd773000    0     0     0 0000204 [RUNQ] g_up

May be incorrect.

%     2 c0eb53c8 cd774000    0     0     0 0000204 [SLP]ggetattr 0xc26cc510] g_event
%    14 c0eb55ac cd775000    0     0     0 0000204 new [IWAIT] swi1: net
%    13 c0eb5790 cd776000    0     0     0 0000204 new [IWAIT] swi4: vm
%    12 c0eb5974 cd777000    0     0     0 000020c [IWAIT] swi8: tty:sio clock
%    11 c0eb5b58 cd778000    0     0     0 000020c [Can run] idle
%     1 c0eb5d3c cd779000    0     0     0 0000200 new [INACTIVE] swapper
%    10 c0ebc000 cd77a000    0     0     0 0000204 [CV]ktrace 0xc03820b4] ktrace
%     0 c037eba0 c0457000    0     0     0 0000200 [CPU 0] swapper

This thread is running, but it should not be running because it has much
lower priority than the irq11 thread.

% db> c
% acd1: CDROM <ATAPI 44X CDROM> at ata2-slave PIO4

The system hung here (not completely -- at least keyboard and sio interrupts
still worked).  I suspect another aspect of the bugs and not a debugging
error.  Shorter debugging sessions in this area mostly don't cause problems.

Bruce
Received on Tue Sep 09 2003 - 03:56:48 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:21 UTC