reliable panics in arstrategy

From: Brooks Davis <brooks_at_one-eyed-alien.net>
Date: Thu, 16 Oct 2003 15:58:52 -0700
I've got four dual Xeon servers that I can reliably panic under disk
load.  All of them have Promise ATA Raid controlers running in RAID1
mode.  They consistantly panic in arstrategy if I run something like a
CVS checkout of ports.  The panic message and ddb backtrace are below
as is the dmesg.  The kernel is the SMP kernel.  I've tried to obtain a
crash dump, but "call dumpsys" just dumps me right back into the same
panic so I'm hoping this is something you can reproduce.  Please let me
know if you need more information or if I need to try and figure out
some way to run gdb on these boxes.

Thanks,
Brooks

Fatal trap 12: page fault while in kernel mode
cpuid = 0; lapic.id = 00000000
fault virtual address   = 0xa6ea70f4
fault code              = supervisor write, page not present
instruction pointer     = 0x8:0xc04dc3b9
stack pointer           = 0x10:0xe0469bc4
frame pointer           = 0x10:0xe0469c54
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 4 (g_down)
kernel: type 12 trap, code=0
Stopped at      arstrategy+0x939:       movl    %eax,0x24(%ebx,%ecx,8)
db> where
arstrategy(c7480750,0,c084365f,5c,0) at arstrategy+0x939
g_disk_start(c7470a20,0,c0843c0f,164,a) at g_disk_start+0x1a6
g_io_schedule_down(c29b1000,2,c0843e31,6e,c06036b0) at
g_io_schedule_down+0x1ac
g_down_procbody(0,e0469d48,c0845bd4,314,ffffffff) at
g_down_procbody+0x48
fork_exit(c06036b0,0,e0469d48) at fork_exit+0xcf
fork_trampoline() at fork_trampoline+0x8
--- trap 0x1, eip = 0, esp = 0xe0469d7c, ebp = 0 ---
db>


Copyright (c) 1992-2003 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
	The Regents of the University of California. All rights reserved.
FreeBSD 5.1-CURRENT #2: Wed Oct 15 05:44:42 PDT 2003
    root_at_nbboard.aero.org:/usr/obj/usr/src/sys/SMP
Preloaded elf kernel "/boot/kernel/kernel" at 0xc0a77000.
Preloaded elf module "/boot/kernel/acpi.ko" at 0xc0a770a8.
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Xeon(TM) CPU 2.40GHz (2392.95-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0xf27  Stepping = 7
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Hyperthreading: 2 logical CPUs
real memory  = 1073676288 (1023 MB)
avail memory = 1033580544 (985 MB)
Programming 24 pins in IOAPIC #0
IOAPIC #0 intpin 2 -> irq 0
Programming 24 pins in IOAPIC #1
Programming 24 pins in IOAPIC #2
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
 cpu0 (BSP): apic id:  0, version: 0x00050014, at 0xfee00000
 cpu1 (AP):  apic id:  1, version: 0x00050014, at 0xfee00000
 cpu2 (AP):  apic id:  6, version: 0x00050014, at 0xfee00000
 cpu3 (AP):  apic id:  7, version: 0x00050014, at 0xfee00000
 io0 (APIC): apic id:  8, version: 0x00178020, at 0xfec00000
 io1 (APIC): apic id:  9, version: 0x00178020, at 0xfec81000
 io2 (APIC): apic id: 10, version: 0x00178020, at 0xfec81400
Pentium Pro MTRR support enabled
    ACPI-0660: *** Warning: Type override - [DEB_] had invalid type (Integer) for Scope operator, changed to (Scope)
    ACPI-0660: *** Warning: Type override - [MLIB] had invalid type (Integer) for Scope operator, changed to (Scope)
    ACPI-0660: *** Warning: Type override - [DATA] had invalid type (String) for Scope operator, changed to (Scope)
    ACPI-0660: *** Warning: Type override - [SIO_] had invalid type (String) for Scope operator, changed to (Scope)
    ACPI-0660: *** Warning: Type override - [LEDP] had invalid type (String) for Scope operator, changed to (Scope)
    ACPI-0660: *** Warning: Type override - [GPEN] had invalid type (String) for Scope operator, changed to (Scope)
    ACPI-0660: *** Warning: Type override - [GPST] had invalid type (String) for Scope operator, changed to (Scope)
    ACPI-0660: *** Warning: Type override - [WUES] had invalid type (String) for Scope operator, changed to (Scope)
    ACPI-0660: *** Warning: Type override - [WUSE] had invalid type (String) for Scope operator, changed to (Scope)
    ACPI-0660: *** Warning: Type override - [SBID] had invalid type (String) for Scope operator, changed to (Scope)
    ACPI-0660: *** Warning: Type override - [SWCE] had invalid type (String) for Scope operator, changed to (Scope)
npx0: <math processor> on motherboard
npx0: INT 16 interface
acpi0: <INTEL  SWV20   > on motherboard
    ACPI-1287: *** Error: Method execution failed [\\_SB_.PCI0.SBRG.EC0_._REG] (Node 0xc6993c20), AE_NOT_EXIST
acpi0: Could not initialise SystemIO handler: AE_NOT_EXIST
device_probe_and_attach: acpi0 attach returned 6
pcibios: BIOS version 2.10
Using $PIR table, 19 entries at 0xc00f3060
pcib0: <Host to PCI bridge> at pcibus 0 on motherboard
pci0: <PCI bus> on pcib0
IOAPIC #0 intpin 16 -> irq 2
IOAPIC #0 intpin 19 -> irq 16
pci0: <unknown> at device 0.1 (no driver attached)
pcib1: <PCIBIOS PCI-PCI bridge> at device 3.0 on pci0
pci2: <PCI bus> on pcib1
pci2: <base peripheral, interrupt controller> at device 28.0 (no driver attached)
pcib2: <PCIBIOS PCI-PCI bridge> at device 29.0 on pci2
pci4: <PCI bus> on pcib2
pci2: <base peripheral, interrupt controller> at device 30.0 (no driver attached)
pcib3: <PCIBIOS PCI-PCI bridge> at device 31.0 on pci2
pci3: <PCI bus> on pcib3
IOAPIC #1 intpin 6 -> irq 18
IOAPIC #1 intpin 7 -> irq 19
em0: <Intel(R) PRO/1000 Network Connection, Version - 1.7.16> port 0x2040-0x207f mem 0xfeac0000-0xfeadffff irq 18 at device 7.0 on pci3
em0:  Speed:N/A  Duplex:N/A
em1: <Intel(R) PRO/1000 Network Connection, Version - 1.7.16> port 0x2000-0x203f mem 0xfeae0000-0xfeafffff irq 19 at device 7.1 on pci3
em1:  Speed:N/A  Duplex:N/A
pci0: <unknown> at device 3.1 (no driver attached)
uhci0: <Intel 82801CA/CAM (ICH3) USB controller USB-A> port 0x3020-0x303f irq 2 at device 29.0 on pci0
usb0: <Intel 82801CA/CAM (ICH3) USB controller USB-A> on uhci0
usb0: USB revision 1.0
uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1: <Intel 82801CA/CAM (ICH3) USB controller USB-B> port 0x3000-0x301f irq 16 at device 29.1 on pci0
usb1: <Intel 82801CA/CAM (ICH3) USB controller USB-B> on uhci1
usb1: USB revision 1.0
uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
pcib4: <PCIBIOS PCI-PCI bridge> at device 30.0 on pci0
pci1: <PCI bus> on pcib4
IOAPIC #0 intpin 18 -> irq 20
atapci0: <Promise PDC20277 UDMA133 controller> port 0x1420-0x142f,0x140c-0x140f,0x1410-0x1417,0x1408-0x140b,0x1400-0x1407 mem 0xfe6e0000-0xfe6e3fff irq 20 at device 2.0 on pci1
atapci0: [MPSAFE]
ata2: at 0x1400 on atapci0
ata2: [MPSAFE]
ata3: at 0x1410 on atapci0
ata3: [MPSAFE]
pci1: <display, VGA> at device 12.0 (no driver attached)
isab0: <PCI-ISA bridge> at device 31.0 on pci0
isa0: <ISA bus> on isab0
atapci1: <Intel ICH3 UDMA100 controller> port 0x3a0-0x3af,0-0x3,0-0x7,0-0x3,0-0x7 irq 0 at device 31.1 on pci0
ata0: at 0x1f0 irq 14 on atapci1
ata0: [MPSAFE]
ata1: at 0x170 irq 15 on atapci1
ata1: [MPSAFE]
pci0: <serial bus, SMBus> at device 31.3 (no driver attached)
orm0: <Option ROMs> at iomem 0xd3000-0xd47ff,0xd1800-0xd2fff,0xc8000-0xd17ff,0xc0000-0xc7fff on isa0
pmtimer0 on isa0
atkbdc0: <Keyboard controller (i8042)> at port 0x64,0x60 on isa0
fdc0: <Enhanced floppy controller (i82077, NE72065 or clone)> at port 0x3f7,0x3f0-0x3f5 irq 6 drq 2 on isa0
fdc0: FIFO enabled, 8 bytes threshold
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
ppc0: parallel port not found.
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x100>
sio0 at port 0x3f8-0x3ff irq 4 on isa0
sio0: type 16550A
sio1 at port 0x2f8-0x2ff irq 3 flags 0x30 on isa0
sio1: type 16550A, console
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
unknown: <PNP0303> can't assign resources (port)
unknown: <PNP0c02> can't assign resources (port)
unknown: <PNP0501> can't assign resources (port)
unknown: <PNP0501> can't assign resources (port)
unknown: <PNP0700> can't assign resources (port)
APIC_IO: Testing 8254 interrupt delivery
APIC_IO: routing 8254 via IOAPIC #0 intpin 2

Timecounters tick every 10.000 msec
acd0: CDROM <SAMSUNG CD-ROM SN-124> at ata1-master PIO4
GEOM: create disk ad4 dp=0xc6a02a70
ad4: 114473MB <ST3120023A> [232581/16/63] at ata2-master UDMA100
GEOM: create disk ad6 dp=0xc6a02070
ad6: 114473MB <ST3120023A> [232581/16/63] at ata3-master UDMA100
GEOM: create disk ar0 dp=0xc6a05de0
ar0: 114473MB <ATA RAID1 array> [14593/255/63] status: READY subdisks:
 disk0 READY on ad4 at ata2-master
 disk1 READY on ad6 at ata3-master
SMP: AP CPU #1 Launched!
SMP: AP CPU #3 Launched!
SMP: AP CPU #2 Launched!
Mounting root from ufs:/dev/ar0s1a
module_register: module pci/em already exists!
Module pci/em failed to register: 17
em0: Link is up 1000 Mbps Full Duplex

-- 
Any statement of the form "X is the one, true Y" is FALSE.
PGP fingerprint 655D 519C 26A7 82E7 2529  9BF0 5D8E 8BE9 F238 1AD4

Received on Thu Oct 16 2003 - 13:58:58 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:25 UTC