kernel panic in sbflush_internal

From: Steven G. Kargl <kargl_at_troutmask.apl.washington.edu>
Date: Mon, 21 May 2007 17:15:07 -0700 (PDT)
One of my colleagues brought down a node on my cluster 
while running a MPI job.  The kernel coredump shows

Script started on Mon May 21 17:02:53 2007
node12:root[201] kgdb kernel.debug vmcore.0
[GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]

Unread portion of the kernel message buffer:
panic: sbflush_internal: cc 4294965848 || mb 0 || mbcnt 0
cpuid = 0
Uptime: 7h6m34s
Physical memory: 16119 MB
Dumping 631 MB: 616 600 584 568 552 536 520 504 488 472 456 440 424 408 392 376 360 344 328 312 296 280 264 248 232 216 200 184 168 152 136 120 104 88 72 56 40 24 8

#0  doadump () at pcpu.h:171
171     pcpu.h: No such file or directory.
        in pcpu.h
(kgdb) bt
#0  doadump () at pcpu.h:171
#1  0xffffffff802a01eb in boot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:409
#2  0xffffffff802a08cc in panic (fmt=0xffffff03157e0d20 "")
    at /usr/src/sys/kern/kern_shutdown.c:563
#3  0xffffffff802f4d23 in sbflush_internal (sb=0xffffff031243ab68)
    at /usr/src/sys/kern/uipc_sockbuf.c:808
#4  0xffffffff802f50cb in sbflush (sb=0xffffff031243ab68)
    at /usr/src/sys/kern/uipc_sockbuf.c:825
#5  0xffffffff803b7246 in tcp_disconnect (tp=0xffffff03101f73e0)
    at /usr/src/sys/netinet/tcp_usrreq.c:1496
#6  0xffffffff803b7539 in tcp_usr_disconnect (so=0xffffff0311a04690)
    at /usr/src/sys/netinet/tcp_usrreq.c:584
#7  0xffffffff802f67f2 in soclose (so=0xffffff031243aae0)
    at /usr/src/sys/kern/uipc_socket.c:642
#8  0xffffffff802de133 in soo_close (fp=0xffffff0312402258, td=0x0)
    at /usr/src/sys/kern/sys_socket.c:296
#9  0xffffffff8027479f in fdrop (fp=0xffffff0312402258, td=0xffffff03157e0d20)
    at file.h:297
#10 0xffffffff80274aaf in closef (fp=0xffffff0312402258, td=0xffffff03157e0d20)
    at /usr/src/sys/kern/kern_descrip.c:1928
#11 0xffffffff80275f54 in fdfree (td=0xffffff03157e0d20)
    at /usr/src/sys/kern/kern_descrip.c:1638
#12 0xffffffff8027f537 in exit1 (td=0xffffff03157e0d20, rv=9)
    at /usr/src/sys/kern/kern_exit.c:271
#13 0xffffffff802a578f in sigexit (td=0xffffff03157e0d20, sig=9)
    at /usr/src/sys/kern/kern_sig.c:2862
#14 0xffffffff802a63ac in postsig (sig=9) at /usr/src/sys/kern/kern_sig.c:2741
#15 0xffffffff802d3547 in ast (framep=0xffffffffb0580c70)
    at /usr/src/sys/kern/subr_trap.c:271
#16 0xffffffff804787f0 in Xfast_syscall ()
---Type <return> to continue, or q <return> to quit---
    at /usr/src/sys/amd64/amd64/exception.S:283
#17 0x00000003c0c7294c in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb) quit

I have the debug kernel and vmcore file, and can make it available.

The dmesg for the node that panic is

Copyright (c) 1992-2007 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
	The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 7.0-CURRENT #6: Fri May 18 10:19:43 PDT 2007
    kargl_at_node10.cimu.org:/usr/obj/usr/src/sys/HPC
ACPI APIC Table: <A M I  OEMAPIC >
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Dual Core AMD Opteron(tm) Processor 280 (2391.55-MHz K8-class CPU)
  Origin = "AuthenticAMD"  Id = 0x20f12  Stepping = 2
  Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
  Features2=0x1<SSE3>
  AMD Features=0xe2500800<SYSCALL,NX,MMX+,FFXSR,LM,3DNow!+,3DNow!>
  AMD Features2=0x3<LAHF,CMP>
  Cores per package: 2
usable memory = 16902705152 (16119 MB)
avail memory  = 16387166208 (15628 MB)
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
 cpu2 (AP): APIC ID:  2
 cpu3 (AP): APIC ID:  3
MADT: Forcing active-low polarity and level trigger for SCI
ioapic0 <Version 1.1> irqs 0-23 on motherboard
ioapic1 <Version 1.1> irqs 24-27 on motherboard
ioapic2 <Version 1.1> irqs 28-31 on motherboard
acpi0: <A M I OEMXSDT> on motherboard
acpi0: [ITHREAD]
acpi_hpet0: <High Precision Event Timer> iomem 0xfec01000-0xfec013ff on acpi0
Timecounter "HPET" frequency 14318180 Hz quality 2000
acpi0: Power Button (fixed)
acpi0: reservation of 0, a0000 (3) failed
acpi0: reservation of 100000, eff00000 (3) failed
Timecounter "ACPI-safe" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0
cpu0: <ACPI CPU> on acpi0
acpi_throttle0: <ACPI CPU Throttling> on cpu0
cpu1: <ACPI CPU> on acpi0
cpu2: <ACPI CPU> on acpi0
cpu3: <ACPI CPU> on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
pcib1: <ACPI PCI-PCI bridge> at device 6.0 on pci0
pci3: <ACPI PCI bus> on pcib1
ohci0: <OHCI (generic) USB controller> mem 0xfeafc000-0xfeafcfff irq 19 at device 0.0 on pci3
ohci0: [GIANT-LOCKED]
ohci0: [ITHREAD]
usb0: OHCI version 1.0, legacy support
usb0: SMM does not respond, resetting
usb0: <OHCI (generic) USB controller> on ohci0
usb0: USB revision 1.0
uhub0: <AMD OHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb0
device_attach: uhub0 attach returned 6
usb0: port 0, set config at addr 1 failed
usb0: root hub problem, error=4
ohci1: <OHCI (generic) USB controller> mem 0xfeafd000-0xfeafdfff irq 19 at device 0.1 on pci3
ohci1: [GIANT-LOCKED]
ohci1: [ITHREAD]
usb1: OHCI version 1.0, legacy support
usb1: SMM does not respond, resetting
usb1: <OHCI (generic) USB controller> on ohci1
usb1: USB revision 1.0
uhub1: <AMD OHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb1
uhub1: 3 ports with 3 removable, self powered
atapci0: <SiI 3114 SATA150 controller> port 0xbc00-0xbc07,0xb400-0xb403,0xb000-0xb007,0xac00-0xac03,0xa800-0xa80f mem 0xfeafec00-0xfeafefff irq 17 at device 5.0 on pci3
atapci0: [ITHREAD]
ata2: <ATA channel 0> on atapci0
ata2: [ITHREAD]
ata3: <ATA channel 1> on atapci0
ata3: [ITHREAD]
ata4: <ATA channel 2> on atapci0
ata4: [ITHREAD]
ata5: <ATA channel 3> on atapci0
ata5: [ITHREAD]
vgapci0: <VGA-compatible display> port 0xb800-0xb8ff mem 0xfd000000-0xfdffffff,0xfeaff000-0xfeafffff irq 18 at device 6.0 on pci3
isab0: <PCI-ISA bridge> at device 7.0 on pci0
isa0: <ISA bus> on isab0
atapci1: <AMD 8111 UDMA133 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 7.1 on pci0
ata0: <ATA channel 0> on atapci1
ata0: [ITHREAD]
ata1: <ATA channel 1> on atapci1
ata1: [ITHREAD]
amdsmb0: <AMD-8111 SMBus 2.0 Controller> port 0xcc00-0xcc1f irq 19 at device 7.2 on pci0
smbus0: <System Management Bus> on amdsmb0
smb0: <SMBus generic I/O> on smbus0
amdpm0: <AMD 756/766/768/8111 Power Management Controller> port 0x10e0-0x10ff at device 7.3 on pci0
smbus1: <System Management Bus> on amdpm0
smb1: <SMBus generic I/O> on smbus1
pcib2: <ACPI PCI-PCI bridge> at device 10.0 on pci0
pci2: <ACPI PCI bus> on pcib2
pci2:9:0: bad VPD cksum, remain 72
bge0: <Broadcom Gigabit Ethernet Controller, ASIC rev. 0x2003> mem 0xfc8c0000-0xfc8cffff,0xfc8b0000-0xfc8bffff irq 24 at device 9.0 on pci2
miibus0: <MII bus> on bge0
brgphy0: <BCM5704 10/100/1000baseTX PHY> PHY 1 on miibus0
brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
bge0: Ethernet address: 00:e0:81:34:e1:4c
bge0: [ITHREAD]
pci2:9:1: bad VPD cksum, remain 72
bge1: <Broadcom Gigabit Ethernet Controller, ASIC rev. 0x2003> mem 0xfc8f0000-0xfc8fffff,0xfc8e0000-0xfc8effff irq 25 at device 9.1 on pci2
miibus1: <MII bus> on bge1
brgphy1: <BCM5704 10/100/1000baseTX PHY> PHY 1 on miibus1
brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
bge1: Ethernet address: 00:e0:81:34:e1:4d
bge1: [ITHREAD]
pcib3: <ACPI PCI-PCI bridge> at device 11.0 on pci0
pci1: <ACPI PCI bus> on pcib3
acpi_button0: <Power Button> on acpi0
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
atkbd0: [ITHREAD]
sio0: configured irq 4 not in bitmap of probed irqs 0
sio0: port may not be enabled
sio0: configured irq 4 not in bitmap of probed irqs 0
sio0: port may not be enabled
sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
sio0: type 16550A
sio0: [FILTER]
sio1: configured irq 3 not in bitmap of probed irqs 0
sio1: port may not be enabled
sio1: configured irq 3 not in bitmap of probed irqs 0
sio1: port may not be enabled
sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0
sio1: type 16550A
sio1: [FILTER]
fdc0: <floppy drive controller (FDE)> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0
fdc0: does not respond
device_attach: fdc0 attach returned 6
ppc0: <Parallel port> port 0x378-0x37f irq 7 on acpi0
ppc0: Generic chipset (NIBBLE-only) in COMPATIBLE mode
ppbus0: <Parallel port bus> on ppc0
lpt0: <Printer> on ppbus0
lpt0: Interrupt-driven port
ppi0: <Parallel I/O> on ppbus0
ppc0: [GIANT-LOCKED]
ppc0: [ITHREAD]
fdc0: <floppy drive controller (FDE)> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0
fdc0: does not respond
device_attach: fdc0 attach returned 6
orm0: <ISA Option ROMs> at iomem 0xc0000-0xc7fff,0xc8000-0xcc7ff,0xcc800-0xcdfff,0xce000-0xcf7ff,0xcf800-0xd07ff on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <8 virtual consoles, flags=0x300>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
Timecounters tick every 1.000 msec
ad4: 239372MB <WDC WD2500YD-01NVB1 10.02E01> at ata2-master SATA150
SMP: AP CPU #1 Launched!
SMP: AP CPU #2 Launched!
SMP: AP CPU #3 Launched!
hwpmc: TSC/1/0x20<REA> K8/4/0x1ff<INT,USR,SYS,EDG,THR,REA,WRI,INV,QUA>
Trying to mount root from ufs:/dev/ad4s1a
WARNING: / was not properly dismounted

-- 
Steve
http://troutmask.apl.washington.edu/~kargl/
Received on Mon May 21 2007 - 22:15:47 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:10 UTC