ATAng generates an interrupt storm deterministically on 1 system here. This should lock up the system, but bugs in interrupt handler scheduling prevent the lockup except in my version of -current where the bugs are different. ATAng interrupt storm: The ATA_ATAPI_IDENTIFY command generates a second "spurious" interrupt. This interrupt is not properly handled. It is completely ignored, at least on the system with the problem. This system has only 1 atapi slave on the channel that generates the interrupt, and no interrupt sharing for this interrupt. Ignoring the interrupt gives an interrupt storm since the interrupt is level sensitive. The interrupt really is generated by the atapi slave, since reading the non-alternate status register of that device clears it. I use this read as a quick fix to limit the storm to 10 interrupts. This may be the correct non-quick fix too. Bugs in interrupt handler scheduling: Ignoring level-sensitive interrupts should result in their thread being switched to endlessly (except for switches to higher priority threads and possibly for switches to equal priority threads). But there is some bug in interrupt handler scheduling that results in low priority threads running while the high priority interrupt storm thread is runnable. This allows the thread that issued the ATA_ATAPI_IDENTIFY command to run eventually, and it or another thread clears the storm as a side effect of issuing another command. I suspect that the scheduling bug is related to the old one of just setting the TDF_NEEDRESCHED flag when a switch can't be done immediately. TDF_NEEDRESCHED has no effect on kernel threads since it is only examined on return to user mode. % int % ithread_schedule(struct ithd *ithread, int do_switch) % { % ... % ithread->it_need = 1; % mtx_lock_spin(&sched_lock); % if (TD_AWAITING_INTR(td)) { % CTR2(KTR_INTR, "%s: setrunqueue %d", __func__, p->p_pid); % TD_CLR_IWAIT(td); % setrunqueue(td); % if (do_switch && % (ctd->td_critnest == 1) ) { ^^^^^^^^^^^^^^^^^^^^^^^ Sometimes the interrupt will occur when ctd (curthread) is in a critical region. Then ctd->td_critnest >= 2 here and we don't switch. % KASSERT((TD_IS_RUNNING(ctd)), % ("ithread_schedule: Bad state for curthread.")); % ctd->td_proc->p_stats->p_ru.ru_nivcsw++; % if (ctd->td_flags & TDF_IDLETD) % ctd->td_state = TDS_CAN_RUN; /* XXXKSE */ % mi_switch(); % } else { % curthread->td_flags |= TDF_NEEDRESCHED; ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ When we don't switch, we set this, but the setting has no effect on kernel threads, so we will only switch to them as a side effect of handling another interrupt. This explains why we may keep running a lower priority thread but now why we switch to a lower priority thread on switching back from handling other interrupts. Annotated debugging session: % total=0x437cc8 entry point=0x129750 % WARNING: loader(8) metadata is missing! % [ preserving 815752 bytes of kernel symbol table ] % Debugger("Boot flags requested debugger") % Stopped at Debugger+0x45: xchgl %ebx,in_Debugger.0 % db> b Xintr11 % db> c Set a breakpoit after booting with -dh. I know from previous debugging sessions that there will be no problems until soon after Xintr11 is hit. % Copyright (c) 1992-2003 The FreeBSD Project. % Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 % The Regents of the University of California. All rights reserved. % FreeBSD 5.1-CURRENT #2863: Tue Sep 9 20:32:22 EST 2003 % bde_at_besplex.bde.org:/c/sysc/i386/compile/BESPLEX % Timecounter "i8254" frequency 1193182 Hz quality 0 % CPU: Pentium II/Pentium II Xeon/Celeron (400.91-MHz 686-class CPU) % Origin = "GenuineIntel" Id = 0x665 Stepping = 5 % Features=0x183fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR> % real memory = 268435456 (256 MB) % avail memory = 256163840 (244 MB) % Pentium Pro MTRR support enabled % npx0: [FAST] % npx0: <math processor> on motherboard % npx0: flags 0x80 npx0: INT 16 interface % pcibios: BIOS version 2.10 % Using $PIR table, 8 entries at 0xc00fdef0 % pcib0: <Intel 82443BX (440 BX) host to PCI bridge> at pcibus 0 on motherboard % pci0: <PCI bus> on pcib0 % pci_cfgintr: 0:11 INTA BIOS irq 11 % pci_cfgintr: 0:11 INTA BIOS irq 11 % pci_cfgintr: 0:13 INTA BIOS irq 9 % pci_cfgintr: 0:17 INTA BIOS irq 5 % pci_cfgintr: 0:19 INTA BIOS irq 11 % pci_cfgintr: 0:19 INTB BIOS irq 11 % pcib1: <PCI-PCI bridge> at device 1.0 on pci0 % pci1: <PCI bus> on pcib1 % pci_cfgintr: 0:1 INTA routed to irq 5 % pcib1: slot 0 INTA is routed to irq 5 % pci1: <display, VGA> at device 0.0 (no driver attached) % isab0: <PCI-ISA bridge> at device 7.0 on pci0 % isa0: <ISA bus> on isab0 % atapci0: <Intel PIIX4 UDMA33 controller> port 0xf000-0xf00f at device 7.1 on pci0 % ata0: at 0x1f0 irq 14 on atapci0 % ata0: [MPSAFE] % ata1: at 0x170 irq 15 on atapci0 % ata1: [MPSAFE] % pci0: <serial bus, USB> at device 7.2 (no driver attached) % pci0: <bridge, PCI-unknown> at device 7.3 (no driver attached) % pci0: <multimedia, video> at device 11.0 (no driver attached) % pci0: <multimedia> at device 11.1 (no driver attached) % fxp0: <Intel 82559 Pro/100 Ethernet> port 0xa400-0xa43f mem 0xea000000-0xea0fffff,0xea104000-0xea104fff irq 9 at device 13.0 on pci0 % fxp0: Ethernet address 00:90:27:99:02:99 % miibus0: <MII bus> on fxp0 % inphy0: <i82555 10/100 media interface> on miibus0 % inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto % puc0: <Titan VScom PCI-200HV2> port 0xb000-0xb01f,0xac00-0xac07,0xa800-0xa807 mem 0xea103000-0xea103fff,0xea102000-0xea102fff irq 5 at device 17.0 on pci0 % stray irq 7 % sio4: <Titan VScom PCI-200HV2> on puc0 % sio4: type 16550A % stray irq 7 % sio5: <Titan VScom PCI-200HV2> on puc0 % sio5: type 16550A % atapci1: <HighPoint HPT366 UDMA66 controller> port 0xbc00-0xbcff,0xb800-0xb803,0xb400-0xb407 irq 11 at device 19.0 on pci0 % atapci1: [MPSAFE] % ata2: at 0xb400 on atapci1 % ata2: [MPSAFE] % atapci2: <HighPoint HPT366 UDMA66 controller> port 0xc800-0xc8ff,0xc400-0xc403,0xc000-0xc007 irq 11 at device 19.1 on pci0 % atapci2: [MPSAFE] % ata3: at 0xc000 on atapci2 % ata3: [MPSAFE] % orm0: <Option ROMs> at iomem 0xc8000-0xcbfff,0xc0000-0xc7fff on isa0 % fdc0: <Enhanced floppy controller (i82077, NE72065 or clone)> at port 0x3f7,0x3f0-0x3f5 irq 6 drq 2 on isa0 % fdc0: FIFO enabled, 8 bytes threshold % fd0: <1440-KB 3.5" drive> on fdc0 drive 0 % atkbdc0: <Keyboard controller (i8042)> at port 0x64,0x60 on isa0 % atkbd0: <AT Keyboard> flags 0x1 irq 1 on atkbdc0 % kbd0 at atkbd0 % psm0: <PS/2 Mouse> irq 12 on atkbdc0 % psm0: model Generic PS/2 mouse, device ID 0 % vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 % sc0: <System console> at flags 0x100 on isa0 % sc0: VGA <16 virtual consoles, flags=0x100> % sio0 at port 0x3f8-0x3ff irq 4 flags 0x90 on isa0 % sio0: type 16550A, console % sio1 at port 0x2f8-0x2ff irq 3 on isa0 % sio1: type 16550A % cy0 at iomem 0xd4000-0xd5fff irq 10 on isa0 % cy0: driver is using old-style compatibility shims % ppc0: <Parallel port> at port 0x378-0x37f irq 7 on isa0 % ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode % ppc0: FIFO with 16/16/16 bytes threshold % ppbus0: <Parallel port bus> on ppc0 % plip0: <PLIP network interface> on ppbus0 % lpt0: <Printer> on ppbus0 % lpt0: Interrupt-driven port % ppi0: <Parallel I/O> on ppbus0 % unknown: <PNP0303> can't assign resources (port) % speaker0: <PC speaker> at port 0x61 on isa0 % unknown: <PNP0a03> can't assign resources (port) % unknown: <PNP0f13> can't assign resources (irq) % unknown: <PNP0501> can't assign resources (port) % unknown: <PNP0700> can't assign resources (port) % unknown: <PNP0401> can't assign resources (port) % unknown: <PNP0501> can't assign resources (port) % Timecounter "TSC" frequency 400909093 Hz quality 800 % Timecounters tick every 10.000 msec % ipfw2 initialized, divert enabled, rule-based forwarding enabled, default to accept, logging disabled % BRIDGE 020214 loaded % GEOM: create disk ad0 dp=0xc26cba70 % ad0: 6149MB <QUANTUM FIREBALL ST6.4A> [13328/15/63] at ata0-master UDMA33 % GEOM: create disk afd0 dp=0xc26cb97c % afd0: REMOVABLE <IOMEGA ZIP 100 ATAPI> at ata0-slave PIO0 % acd0: CDRW <RICOH CD-RW MP7320A> at ata1-slave UDMA33 % Breakpoint at Xintr11: pushl $0 Now display some state. Nothing much happening yet. Forgot to do "t 0" to show the source of the interrupt. % db> t % Xintr11(8,80206,c0eb7d10,c0eba680,c0eb7d10) at Xintr11 % runq_add(c0386000,c0eb8b00,cd73fca0,c01d4e0d,c0eb8b00) at runq_add+0x4e % sched_add(c0eb8b00,c036bda0,c0eba680,c0eb7d10,c0eb7d10) at sched_add+0x1e % setrunqueue(c0eb7d10,c0eb7d10,4,cd73fcc0,c01d64b4) at setrunqueue+0x2d % sched_wakeup(c0eb7d10) at sched_wakeup+0x29 % setrunnable(c0eb7d10) at setrunnable+0x80 % endtsleep(c0eb7d10) at endtsleep+0xd6 % softclock(0) at softclock+0x21f % ithread_loop(c0eb4280,cd73fd48) at ithread_loop+0x169 % fork_exit(c01bdab0,c0eb4280,cd73fd48) at fork_exit+0xad % fork_trampoline() at fork_trampoline+0x8 % --- trap 0x1, eip = 0, esp = 0xcd73fd7c, ebp = 0 --- % db> ps % pid proc uarea uid ppid pgrp flag stat wmesg wchan cmd % 36 c26bf000 ce1d3000 0 0 0 0000204 new [IWAIT] irq8: rtc % 35 c26bf1e4 ce1d4000 0 0 0 0000204 new [IWAIT] irq0: clk % 34 c253a5ac ce18d000 0 0 0 0000204 [IWAIT] irq7: ppc0 % 33 c253a790 ce18e000 0 0 0 0000204 new [IWAIT] irq10: cy0 % 32 c253a974 ce18f000 0 0 0 0000204 new [IWAIT] irq3: sio1 % 31 c253ab58 ce190000 0 0 0 0000204 new [IWAIT] irq4: sio0 % 30 c253ad3c ce191000 0 0 0 0000204 new [IWAIT] irq12: psm0 % 29 c25b8000 ce19c000 0 0 0 0000204 new [IWAIT] irq1: atkbd0 % 28 c25b81e4 ce19d000 0 0 0 0000204 [IWAIT] irq6: fdc0 % 27 c25b83c8 ce19e000 0 0 0 0000204 new [IWAIT] irq11: atapci1+ % 26 c25b85ac ce19f000 0 0 0 0000204 [IWAIT] swi0: tty:sio+ % 25 c25b8790 ce1a0000 0 0 0 0000204 new [IWAIT] irq5: puc0 % 24 c25b8974 ce1a1000 0 0 0 0000204 new [IWAIT] irq9: fxp0 % 23 c0ebc1e4 cd77b000 0 0 0 0000204 [IWAIT] irq15: ata1 % 22 c0ebc3c8 cd77c000 0 0 0 0000204 [IWAIT] irq14: ata0 % 21 c0ebc5ac cd77d000 0 0 0 0000204 new [IWAIT] irq13: % 20 c0ebc790 cd77e000 0 0 0 0000204 [RUNQ] swi7: task queue % 19 c0ebc974 cd77f000 0 0 0 0000204 new [IWAIT] swi3: cambio % 18 c0ebcb58 cd780000 0 0 0 0000204 new [IWAIT] swi2: camnet % --More-- 17 c0ebcd3c cd7a8000 0 0 0 0000204 new [IWAIT] swi5:+ % 5 c253a000 ce18a000 0 1 0 0000204 [SLP]tqthr 0xc0386848] taskqueue % 16 c253a1e4 ce18b000 0 0 0 0000204 new [IWAIT] swi6:+ % 15 c253a3c8 ce18c000 0 0 0 0000204 [RUNQ] random % 4 c0eb5000 cd724000 0 0 0 0000204 [SLP]- 0xc037e9e0] g_down % 3 c0eb51e4 cd773000 0 0 0 0000204 [SLP]- 0xc037e9dc] g_up % 2 c0eb53c8 cd774000 0 0 0 0000204 [SLP]gread 0xc26cc6c0] g_event % 14 c0eb55ac cd775000 0 0 0 0000204 new [IWAIT] swi1: net % 13 c0eb5790 cd776000 0 0 0 0000204 new [IWAIT] swi4: vm % 12 c0eb5974 cd777000 0 0 0 000020c [CPU 0] swi8: tty:sio clock % 11 c0eb5b58 cd778000 0 0 0 000020c [Can run] idle % 1 c0eb5d3c cd779000 0 0 0 0000200 new [INACTIVE] swapper % 10 c0ebc000 cd77a000 0 0 0 0000204 [CV]ktrace 0xc03820b4] ktrace % 0 c037eba0 c0457000 0 0 0 0000200 [SLP]atareq 0xc0eb3d00] swapper % db> b ata_interrupt % db> c Set another breakpoit. I know from previous debugging sessions that it will be hit soon with nothing much happening in between. % Breakpoint at ata_interrupt: pushl %ebp % db> t % ata_interrupt(c259a600) at ata_interrupt % ata_highpoint_intr(c25b7780) at ata_highpoint_intr+0x163 % ithread_loop(c25b7700,cd757d48) at ithread_loop+0x169 % fork_exit(c01bdab0,c25b7700,cd757d48) at fork_exit+0xad % fork_trampoline() at fork_trampoline+0x8 % --- trap 0x1, eip = 0, esp = 0xcd757d7c, ebp = 0 --- % db> ps % pid proc uarea uid ppid pgrp flag stat wmesg wchan cmd % 36 c26bf000 ce1d3000 0 0 0 0000204 new [IWAIT] irq8: rtc % 35 c26bf1e4 ce1d4000 0 0 0 0000204 new [IWAIT] irq0: clk % 34 c253a5ac ce18d000 0 0 0 0000204 [IWAIT] irq7: ppc0 % 33 c253a790 ce18e000 0 0 0 0000204 new [IWAIT] irq10: cy0 % 32 c253a974 ce18f000 0 0 0 0000204 new [IWAIT] irq3: sio1 % 31 c253ab58 ce190000 0 0 0 0000204 new [IWAIT] irq4: sio0 % 30 c253ad3c ce191000 0 0 0 0000204 new [IWAIT] irq12: psm0 % 29 c25b8000 ce19c000 0 0 0 0000204 new [IWAIT] irq1: atkbd0 % 28 c25b81e4 ce19d000 0 0 0 0000204 [IWAIT] irq6: fdc0 % 27 c25b83c8 ce19e000 0 0 0 0000204 [CPU 0] irq11: atapci1+ % 26 c25b85ac ce19f000 0 0 0 0000204 [IWAIT] swi0: tty:sio+ % 25 c25b8790 ce1a0000 0 0 0 0000204 new [IWAIT] irq5: puc0 % 24 c25b8974 ce1a1000 0 0 0 0000204 new [IWAIT] irq9: fxp0 % 23 c0ebc1e4 cd77b000 0 0 0 0000204 [IWAIT] irq15: ata1 % 22 c0ebc3c8 cd77c000 0 0 0 0000204 [IWAIT] irq14: ata0 % 21 c0ebc5ac cd77d000 0 0 0 0000204 new [IWAIT] irq13: % 20 c0ebc790 cd77e000 0 0 0 0000204 [RUNQ] swi7: task queue % 19 c0ebc974 cd77f000 0 0 0 0000204 new [IWAIT] swi3: cambio % 18 c0ebcb58 cd780000 0 0 0 0000204 new [IWAIT] swi2: camnet % --More-- 17 c0ebcd3c cd7a8000 0 0 0 0000204 new [IWAIT] swi5:+ % 5 c253a000 ce18a000 0 1 0 0000204 [SLP]tqthr 0xc0386848] taskqueue % 16 c253a1e4 ce18b000 0 0 0 0000204 new [IWAIT] swi6:+ % 15 c253a3c8 ce18c000 0 0 0 0000204 [RUNQ] random % 4 c0eb5000 cd724000 0 0 0 0000204 [SLP]- 0xc037e9e0] g_down % 3 c0eb51e4 cd773000 0 0 0 0000204 [SLP]- 0xc037e9dc] g_up % 2 c0eb53c8 cd774000 0 0 0 0000204 [SLP]gread 0xc26cc6c0] g_event % 14 c0eb55ac cd775000 0 0 0 0000204 new [IWAIT] swi1: net % 13 c0eb5790 cd776000 0 0 0 0000204 new [IWAIT] swi4: vm % 12 c0eb5974 cd777000 0 0 0 000020c [IWAIT] swi8: tty:sio clock % 11 c0eb5b58 cd778000 0 0 0 000020c [Can run] idle % 1 c0eb5d3c cd779000 0 0 0 0000200 new [INACTIVE] swapper % 10 c0ebc000 cd77a000 0 0 0 0000204 [CV]ktrace 0xc03820b4] ktrace % 0 c037eba0 c0457000 0 0 0 0000200 [SLP]atareq 0xc0eb3d00] swapper % db> t 0 % mi_switch(1,0,0,0,c0eb3d00) at mi_switch+0x189 % msleep(c0eb3d00,0,4c,c0321ffe,ea60) at msleep+0x475 % ata_queue_request(c0eb3d00,a115349d,c259a600,2,45e000) at ata_queue_request+0x151 % ata_getparam(c259a6d4,a1) at ata_getparam+0x82 % ata_identify_devices(c259a600) at ata_identify_devices+0x5a % ata_boot_attach(0) at ata_boot_attach+0x27 % run_interrupt_driven_config_hooks(0,456c00,456000,0,c01297e5) at run_interrupt_driven_config_hooks+0x18 % mi_startup() at mi_startup+0x96 % begin() at begin+0x2c Remembered to show the source of the interrupt this time. Begin detailed trace since now since this is the easiest way to reach the critical insl. % db> s % Stopped at ata_interrupt+0x1: movl %esp,%ebp % db> % Stopped at ata_interrupt+0x3: pushl %edi % db> % Stopped at ata_interrupt+0x4: pushl %esi % db> % Stopped at ata_interrupt+0x5: pushl %ebx % db> % Stopped at ata_interrupt+0x6: subl $0x18,%esp % db> % Stopped at ata_interrupt+0x9: movl 0x8(%ebp),%esi % db> % Stopped at ata_interrupt+0xc: movl 0x14c(%esi),%edi % db> % Stopped at ata_interrupt+0x12: cmpl $0,0x108(%esi) % db> % Stopped at ata_interrupt+0x19: jz ata_interrupt+0x939 % db> % Stopped at ata_interrupt+0x1f: cmpl $0,0x48(%esi) % db> % Stopped at ata_interrupt+0x23: jz ata_interrupt+0x44 % db> % Stopped at ata_interrupt+0x25: pushl 0x48(%esi) % db> % Stopped at ata_interrupt+0x28: call rman_get_bustag % db> % Stopped at rman_get_bustag: pushl %ebp % db> % Stopped at rman_get_bustag+0x1: movl %esp,%ebp % db> % Stopped at rman_get_bustag+0x3: movl 0x8(%ebp),%eax % db> % Stopped at rman_get_bustag+0x6: movl 0x24(%eax),%eax % db> % Stopped at rman_get_bustag+0x9: leave % db> % Stopped at rman_get_bustag+0xa: ret % db> % Stopped at ata_interrupt+0x2d: addl $0x4,%esp % db> % Stopped at ata_interrupt+0x30: movl %eax,%ebx % db> % Stopped at ata_interrupt+0x32: pushl 0x48(%esi) % db> % Stopped at ata_interrupt+0x35: call rman_get_bushandle % db> % Stopped at rman_get_bushandle: pushl %ebp % db> % Stopped at rman_get_bushandle+0x1: movl %esp,%ebp % db> % Stopped at rman_get_bushandle+0x3: movl 0x8(%ebp),%eax % db> % Stopped at rman_get_bushandle+0x6: movl 0x28(%eax),%eax % db> % Stopped at rman_get_bushandle+0x9: leave % db> % Stopped at rman_get_bushandle+0xa: ret % db> % Stopped at ata_interrupt+0x3a: addl $0x4,%esp % db> % Stopped at ata_interrupt+0x3d: movl %eax,%edx % db> % Stopped at ata_interrupt+0x3f: movl 0x4c(%esi),%eax % db> % Stopped at ata_interrupt+0x42: jmp ata_interrupt+0x99 % db> % Stopped at ata_interrupt+0x99: testl %ebx,%ebx % db> % Stopped at ata_interrupt+0x9b: jnz ata_interrupt+0xa4 % db> % Stopped at ata_interrupt+0x9d: addl %eax,%edx % db> % Stopped at ata_interrupt+0x9f: inb %dx,%al % db> % Stopped at ata_interrupt+0xa0: jmp ata_interrupt+0xa7 % db> % Stopped at ata_interrupt+0xa7: movzbl %al,%eax % db> % Stopped at ata_interrupt+0xaa: testb %al,%al % db> % Stopped at ata_interrupt+0xac: jns ata_interrupt+0x14c % db> % Stopped at ata_interrupt+0x14c: cmpl $0,0x40(%esi) % db> % Stopped at ata_interrupt+0x150: jz ata_interrupt+0x174 % db> % Stopped at ata_interrupt+0x152: pushl 0x40(%esi) % db> % Stopped at ata_interrupt+0x155: call rman_get_bustag % db> % Stopped at rman_get_bustag: pushl %ebp % db> % Stopped at rman_get_bustag+0x1: movl %esp,%ebp % db> % Stopped at rman_get_bustag+0x3: movl 0x8(%ebp),%eax % db> % Stopped at rman_get_bustag+0x6: movl 0x24(%eax),%eax % db> % Stopped at rman_get_bustag+0x9: leave % db> % Stopped at rman_get_bustag+0xa: ret % db> % Stopped at ata_interrupt+0x15a: addl $0x4,%esp % db> % Stopped at ata_interrupt+0x15d: movl %eax,%ebx % db> % Stopped at ata_interrupt+0x15f: pushl 0x40(%esi) % db> % Stopped at ata_interrupt+0x162: call rman_get_bushandle % db> % Stopped at rman_get_bushandle: pushl %ebp % db> % Stopped at rman_get_bushandle+0x1: movl %esp,%ebp % db> % Stopped at rman_get_bushandle+0x3: movl 0x8(%ebp),%eax % db> % Stopped at rman_get_bushandle+0x6: movl 0x28(%eax),%eax % db> % Stopped at rman_get_bushandle+0x9: leave % db> % Stopped at rman_get_bushandle+0xa: ret % db> % Stopped at ata_interrupt+0x167: addl $0x4,%esp % db> % Stopped at ata_interrupt+0x16a: movl %eax,%edx % db> % Stopped at ata_interrupt+0x16c: movl 0x44(%esi),%eax % db> % Stopped at ata_interrupt+0x16f: jmp ata_interrupt+0x1c9 % db> % Stopped at ata_interrupt+0x1c9: testl %ebx,%ebx % db> % Stopped at ata_interrupt+0x1cb: jnz ata_interrupt+0x1d4 % db> % Stopped at ata_interrupt+0x1cd: addl %eax,%edx % db> % Stopped at ata_interrupt+0x1cf: inb %dx,%al % db> % Stopped at ata_interrupt+0x1d0: jmp ata_interrupt+0x1d7 % db> % Stopped at ata_interrupt+0x1d7: movzbl %al,%eax % db> % Stopped at ata_interrupt+0x1da: movb %al,0xfffffff3(%ebp) % db> % Stopped at ata_interrupt+0x1dd: testl %edi,%edi % db> % Stopped at ata_interrupt+0x1df: jnz ata_interrupt+0x32c % db> % Stopped at ata_interrupt+0x32c: movb 0xfffffff3(%ebp),%al % db> % Stopped at ata_interrupt+0x32f: movb %al,0x18(%edi) % db> % Stopped at ata_interrupt+0x332: movl 0x2c(%edi),%eax % db> % Stopped at ata_interrupt+0x335: andl $0x50,%eax % db> % Stopped at ata_interrupt+0x338: cmpl $0x40,%eax % db> % Stopped at ata_interrupt+0x33b: jz ata_interrupt+0x488 % db> % Stopped at ata_interrupt+0x341: cmpl $0x40,%eax % db> % Stopped at ata_interrupt+0x344: jnle ata_interrupt+0x354 % db> % Stopped at ata_interrupt+0x346: cmpl $0x10,%eax % db> % Stopped at ata_interrupt+0x349: jz ata_interrupt+0x4a1 % db> % Stopped at ata_interrupt+0x34f: jmp ata_interrupt+0x35d % db> % Stopped at ata_interrupt+0x35d: testb $0x1,0x18(%edi) % db> % Stopped at ata_interrupt+0x361: jnz ata_interrupt+0x870 % db> % Stopped at ata_interrupt+0x367: testb $0x4,0x2c(%edi) % db> % Stopped at ata_interrupt+0x36b: jz ata_interrupt+0x37c % db> % Stopped at ata_interrupt+0x36d: pushl 0x20(%edi) % db> % Stopped at ata_interrupt+0x370: pushl %edi % db> % Stopped at ata_interrupt+0x371: call ata_pio_read % db> % Stopped at ata_pio_read: pushl %ebp % db> % Stopped at ata_pio_read+0x1: movl %esp,%ebp % db> % Stopped at ata_pio_read+0x3: pushl %edi % db> % Stopped at ata_pio_read+0x4: pushl %esi % db> % Stopped at ata_pio_read+0x5: pushl %ebx % db> % Stopped at ata_pio_read+0x6: subl $0x14,%esp % db> % Stopped at ata_pio_read+0x9: movl 0x8(%ebp),%eax % db> % Stopped at ata_pio_read+0xc: movl 0x20(%eax),%edx % db> % Stopped at ata_pio_read+0xf: movl 0xc(%ebp),%eax % db> % Stopped at ata_pio_read+0x12: cmpl %edx,%eax % db> % Stopped at ata_pio_read+0x14: jbe ata_pio_read+0x18 % db> % Stopped at ata_pio_read+0x18: movl %eax,0xfffffff0(%ebp) % db> % Stopped at ata_pio_read+0x1b: movl 0x8(%ebp),%edx % db> % Stopped at ata_pio_read+0x1e: movl 0(%edx),%eax % db> % Stopped at ata_pio_read+0x20: movl 0(%eax),%esi % db> % Stopped at ata_pio_read+0x22: testb $0x2,0xa0(%esi) % db> % Stopped at ata_pio_read+0x29: jnz ata_pio_read+0x35 % db> % Stopped at ata_pio_read+0x2b: testb $0x3,0xfffffff0(%ebp) % db> % Stopped at ata_pio_read+0x2f: jz ata_pio_read+0x104 % db> % Stopped at ata_pio_read+0x104: cmpl $0,0x8(%esi) % db> % Stopped at ata_pio_read+0x108: jz ata_pio_read+0x147 % db> % Stopped at ata_pio_read+0x10a: pushl 0x8(%esi) % db> % Stopped at ata_pio_read+0x10d: call rman_get_bustag % db> % Stopped at rman_get_bustag: pushl %ebp % db> % Stopped at rman_get_bustag+0x1: movl %esp,%ebp % db> % Stopped at rman_get_bustag+0x3: movl 0x8(%ebp),%eax % db> % Stopped at rman_get_bustag+0x6: movl 0x24(%eax),%eax % db> % Stopped at rman_get_bustag+0x9: leave % db> % Stopped at rman_get_bustag+0xa: ret % db> % Stopped at ata_pio_read+0x112: addl $0x4,%esp % db> % Stopped at ata_pio_read+0x115: movl %eax,%ebx % db> % Stopped at ata_pio_read+0x117: pushl 0x8(%esi) % db> % Stopped at ata_pio_read+0x11a: call rman_get_bushandle % db> % Stopped at rman_get_bushandle: pushl %ebp % db> % Stopped at rman_get_bushandle+0x1: movl %esp,%ebp % db> % Stopped at rman_get_bushandle+0x3: movl 0x8(%ebp),%eax % db> % Stopped at rman_get_bushandle+0x6: movl 0x28(%eax),%eax % db> % Stopped at rman_get_bushandle+0x9: leave % db> % Stopped at rman_get_bushandle+0xa: ret % db> % Stopped at ata_pio_read+0x11f: addl $0x4,%esp % db> % Stopped at ata_pio_read+0x122: movl %eax,0xffffffe4(%ebp) % db> % Stopped at ata_pio_read+0x125: movl 0xc(%esi),%edx % db> % Stopped at ata_pio_read+0x128: movl 0x8(%ebp),%ecx % db> % Stopped at ata_pio_read+0x12b: movl 0x28(%ecx),%eax % db> % Stopped at ata_pio_read+0x12e: movl %eax,%edi % db> % Stopped at ata_pio_read+0x130: addl 0x24(%ecx),%edi % db> % Stopped at ata_pio_read+0x133: movl 0xfffffff0(%ebp),%ecx % db> % Stopped at ata_pio_read+0x136: shrl $0x2,%ecx % db> % Stopped at ata_pio_read+0x139: testl %ebx,%ebx % db> % Stopped at ata_pio_read+0x13b: jnz ata_pio_read+0x142 % db> % Stopped at ata_pio_read+0x13d: addl 0xffffffe4(%ebp),%edx % db> % Stopped at ata_pio_read+0x140: jmp ata_pio_read+0x1b3 % db> % Stopped at ata_pio_read+0x1b3: cld % db> % Stopped at ata_pio_read+0x1b4: repe insl %dx,%es:(%edi) % db> % Stopped at ata_pio_read+0x1b4: repe insl %dx,%es:(%edi) % db> % Stopped at ata_pio_read+0x1b4: repe insl %dx,%es:(%edi) % db> % Stopped at ata_pio_read+0x1b4: repe insl %dx,%es:(%edi) % db> % Stopped at ata_pio_read+0x1b4: repe insl %dx,%es:(%edi) We just started reading the data returned by ATA_ATAPI_IDENTIFY. Show the registers to get a count of how many insl's to skip. % db> sh r % cs 0x8 % ds 0x10 % es 0x10 % fs 0x18 % ss 0x10 % eax 0xc25eaa00 % ecx 0x7c % edx 0xb400 % ebx 0 % esp 0xcd757c88 % ebp 0xcd757ca8 % esi 0xc259a600 % edi 0xc25eaa10 % eip 0xc0142fb8 ata_pio_read+0x1b4 % efl 0x90206 % dr0 0 % dr1 0 % dr2 0 % dr3 0 % dr4 0xffff4ff0 % dr5 0x400 % dr6 0xffff4ff0 % dr7 0x400 % ata_pio_read+0x1b4: repe insl %dx,%es:(%edi) % db> s,0x7b % Stopped at ata_pio_read+0x1b4: repe insl %dx,%es:(%edi) Skip all except the last insl. % db> call inb(0xa0) % 0x1 This verifies that there is no irq11 pending now (bit 2^(11-8) = 0x8). % db> call inb(0xa1) % 0x28 This verifies that irq11 is masked now (bit 2^(11-8) = 0x8). It needs to be masked for the previous read of the active interrupts to be valid. % db> s % Stopped at ata_pio_read+0x1b6: jmp ata_pio_read+0x1c4 Do the last insl. % db> call inb(0xa0) % 0x9 Now there is an irq11 pending. The device apparently has more to say. (I forgot to read its status here. IIRC it doesn't set DRQ.) % db> call inb(0xa1) % 0x28 This verifies that irq11 is still masked. % db> ps % pid proc uarea uid ppid pgrp flag stat wmesg wchan cmd % 36 c26bf000 ce1d3000 0 0 0 0000204 new [IWAIT] irq8: rtc % 35 c26bf1e4 ce1d4000 0 0 0 0000204 new [IWAIT] irq0: clk % 34 c253a5ac ce18d000 0 0 0 0000204 [IWAIT] irq7: ppc0 % 33 c253a790 ce18e000 0 0 0 0000204 new [IWAIT] irq10: cy0 % 32 c253a974 ce18f000 0 0 0 0000204 new [IWAIT] irq3: sio1 % 31 c253ab58 ce190000 0 0 0 0000204 new [IWAIT] irq4: sio0 % 30 c253ad3c ce191000 0 0 0 0000204 new [IWAIT] irq12: psm0 % 29 c25b8000 ce19c000 0 0 0 0000204 new [IWAIT] irq1: atkbd0 % 28 c25b81e4 ce19d000 0 0 0 0000204 [IWAIT] irq6: fdc0 % 27 c25b83c8 ce19e000 0 0 0 0000204 [CPU 0] irq11: atapci1+ Pid 27 is the irq11 handler. It is running. % 26 c25b85ac ce19f000 0 0 0 0000204 [IWAIT] swi0: tty:sio+ % 25 c25b8790 ce1a0000 0 0 0 0000204 new [IWAIT] irq5: puc0 % 24 c25b8974 ce1a1000 0 0 0 0000204 new [IWAIT] irq9: fxp0 % 23 c0ebc1e4 cd77b000 0 0 0 0000204 [IWAIT] irq15: ata1 % 22 c0ebc3c8 cd77c000 0 0 0 0000204 [IWAIT] irq14: ata0 % 21 c0ebc5ac cd77d000 0 0 0 0000204 new [IWAIT] irq13: % 20 c0ebc790 cd77e000 0 0 0 0000204 [RUNQ] swi7: task queue % 19 c0ebc974 cd77f000 0 0 0 0000204 new [IWAIT] swi3: cambio % 18 c0ebcb58 cd780000 0 0 0 0000204 new [IWAIT] swi2: camnet % --More-- 17 c0ebcd3c cd7a8000 0 0 0 0000204 new [IWAIT] swi5:+ % 5 c253a000 ce18a000 0 1 0 0000204 [SLP]tqthr 0xc0386848] taskqueue % 16 c253a1e4 ce18b000 0 0 0 0000204 new [IWAIT] swi6:+ % 15 c253a3c8 ce18c000 0 0 0 0000204 [RUNQ] random % 4 c0eb5000 cd724000 0 0 0 0000204 [SLP]- 0xc037e9e0] g_down % 3 c0eb51e4 cd773000 0 0 0 0000204 [SLP]- 0xc037e9dc] g_up % 2 c0eb53c8 cd774000 0 0 0 0000204 [SLP]gread 0xc26cc6c0] g_event % 14 c0eb55ac cd775000 0 0 0 0000204 new [IWAIT] swi1: net % 13 c0eb5790 cd776000 0 0 0 0000204 new [IWAIT] swi4: vm % 12 c0eb5974 cd777000 0 0 0 000020c [RUNQ] swi8: tty:sio clock % 11 c0eb5b58 cd778000 0 0 0 000020c [Can run] idle % 1 c0eb5d3c cd779000 0 0 0 0000200 new [INACTIVE] swapper % 10 c0ebc000 cd77a000 0 0 0 0000204 [CV]ktrace 0xc03820b4] ktrace % 0 c037eba0 c0457000 0 0 0 0000200 [SLP]atareq 0xc0eb3d00] swapper % db> t 0 % mi_switch(1,0,0,0,c0eb3d00) at mi_switch+0x189 % msleep(c0eb3d00,0,4c,c0321ffe,ea60) at msleep+0x475 % ata_queue_request(c0eb3d00,a115349d,c259a600,2,45e000) at ata_queue_request+0x151 % ata_getparam(c259a6d4,a1) at ata_getparam+0x82 % ata_identify_devices(c259a600) at ata_identify_devices+0x5a % ata_boot_attach(0) at ata_boot_attach+0x27 % run_interrupt_driven_config_hooks(0,456c00,456000,0,c01297e5) at run_interrupt_driven_config_hooks+0x18 % mi_startup() at mi_startup+0x96 % begin() at begin+0x2c Pid 0 is the thread that issued the ATA_ATAPI_IDENTIFY command. It is still sleeping. It will be woken up by an ata_finish() soon (not shown). % db> b ata_getparam+0x82 % db> c Prepare to look at completion of the ATA_ATAPI_IDENTIFY command (from the viewpoint of the thread that issued it). % Breakpoint at Xintr11: pushl $0 % db> % Breakpoint at ata_interrupt: pushl %ebp % db> % Breakpoint at Xintr11: pushl $0 % db> % Breakpoint at ata_interrupt: pushl %ebp % db> % Breakpoint at Xintr11: pushl $0 These are all for the "spurious" interrupt. They are completely unhandled. Other debugging sessions showed that ata_interrupt returns almost immediately after finding that ch->state == ATA_IDLE. ata_interru[t's handling regressed in ata-lowlevel.c rev.1.9. In rev.1.8 it at least printed a message about this problem in the bootverbose case. % db> d % db> c % Breakpoint at ata_interrupt: pushl %ebp % db> d % db> c % Breakpoint at ata_getparam+0x82: movl 0x40(%ebx),%edi Clear breakpoints in interrupt handlers so as to get to the thread that issued the command as soon as possible, and continue to that thread. Other debugging sessions showed no suprises except for the "spurious" interrupt. % db> t % ata_getparam(c259a6d4,a1) at ata_getparam+0x82 % ata_identify_devices(c259a600) at ata_identify_devices+0x5a % ata_boot_attach(0) at ata_boot_attach+0x27 % run_interrupt_driven_config_hooks(0,456c00,456000,0,c01297e5) at run_interrupt_driven_config_hooks+0x18 % mi_startup() at mi_startup+0x96 % begin() at begin+0x2c % db> ps Now running the main initialization thread. Show the scheduling bugs that let us get to it. % pid proc uarea uid ppid pgrp flag stat wmesg wchan cmd % 36 c26bf000 ce1d3000 0 0 0 0000204 new [IWAIT] irq8: rtc % 35 c26bf1e4 ce1d4000 0 0 0 0000204 new [IWAIT] irq0: clk % 34 c253a5ac ce18d000 0 0 0 0000204 [IWAIT] irq7: ppc0 % 33 c253a790 ce18e000 0 0 0 0000204 new [IWAIT] irq10: cy0 % 32 c253a974 ce18f000 0 0 0 0000204 new [IWAIT] irq3: sio1 % 31 c253ab58 ce190000 0 0 0 0000204 new [IWAIT] irq4: sio0 % 30 c253ad3c ce191000 0 0 0 0000204 new [IWAIT] irq12: psm0 % 29 c25b8000 ce19c000 0 0 0 0000204 new [IWAIT] irq1: atkbd0 % 28 c25b81e4 ce19d000 0 0 0 0000204 [IWAIT] irq6: fdc0 % 27 c25b83c8 ce19e000 0 0 0 0000204 [RUNQ] irq11: atapci1+ The irq11 thread wants to run, but for some reason it isn't running. Other debugging sessions shows that all priorities are reasonable, with the irq11 thread having a much higher priority than thread 0. % 26 c25b85ac ce19f000 0 0 0 0000204 [IWAIT] swi0: tty:sio+ % 25 c25b8790 ce1a0000 0 0 0 0000204 new [IWAIT] irq5: puc0 % 24 c25b8974 ce1a1000 0 0 0 0000204 new [IWAIT] irq9: fxp0 % 23 c0ebc1e4 cd77b000 0 0 0 0000204 [IWAIT] irq15: ata1 % 22 c0ebc3c8 cd77c000 0 0 0 0000204 [IWAIT] irq14: ata0 % 21 c0ebc5ac cd77d000 0 0 0 0000204 new [IWAIT] irq13: % 20 c0ebc790 cd77e000 0 0 0 0000204 [IWAIT] swi7: task queue % 19 c0ebc974 cd77f000 0 0 0 0000204 [IWAIT] swi3: cambio % 18 c0ebcb58 cd780000 0 0 0 0000204 new [IWAIT] swi2: camnet % --More-- 17 c0ebcd3c cd7a8000 0 0 0 0000204 new [IWAIT] swi5:+ % 5 c253a000 ce18a000 0 1 0 0000204 [SLP]tqthr 0xc0386848] taskqueue % 16 c253a1e4 ce18b000 0 0 0 0000204 new [IWAIT] swi6:+ % 15 c253a3c8 ce18c000 0 0 0 0000204 [RUNQ] random This is a kernel thread, but it has a user priority so its non-runningness is correct. % 4 c0eb5000 cd724000 0 0 0 0000204 [SLP]- 0xc037e9e0] g_down % 3 c0eb51e4 cd773000 0 0 0 0000204 [RUNQ] g_up May be incorrect. % 2 c0eb53c8 cd774000 0 0 0 0000204 [SLP]ggetattr 0xc26cc510] g_event % 14 c0eb55ac cd775000 0 0 0 0000204 new [IWAIT] swi1: net % 13 c0eb5790 cd776000 0 0 0 0000204 new [IWAIT] swi4: vm % 12 c0eb5974 cd777000 0 0 0 000020c [IWAIT] swi8: tty:sio clock % 11 c0eb5b58 cd778000 0 0 0 000020c [Can run] idle % 1 c0eb5d3c cd779000 0 0 0 0000200 new [INACTIVE] swapper % 10 c0ebc000 cd77a000 0 0 0 0000204 [CV]ktrace 0xc03820b4] ktrace % 0 c037eba0 c0457000 0 0 0 0000200 [CPU 0] swapper This thread is running, but it should not be running because it has much lower priority than the irq11 thread. % db> c % acd1: CDROM <ATAPI 44X CDROM> at ata2-slave PIO4 The system hung here (not completely -- at least keyboard and sio interrupts still worked). I suspect another aspect of the bugs and not a debugging error. Shorter debugging sessions in this area mostly don't cause problems. BruceReceived on Tue Sep 09 2003 - 03:56:48 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:21 UTC