Re: kernel panic in sbflush_internal

From: Robert Watson <rwatson_at_FreeBSD.org>
Date: Tue, 22 May 2007 08:21:13 -0400 (EDT)
On Mon, 21 May 2007, Steven G. Kargl wrote:

> One of my colleagues brought down a node on my cluster while running a MPI 
> job.  The kernel coredump shows
>
> Script started on Mon May 21 17:02:53 2007
> node12:root[201] kgdb kernel.debug vmcore.0
> [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
>
> Unread portion of the kernel message buffer:
> panic: sbflush_internal: cc 4294965848 || mb 0 || mbcnt 0
> cpuid = 0
> Uptime: 7h6m34s
> Physical memory: 16119 MB
> Dumping 631 MB: 616 600 584 568 552 536 520 504 488 472 456 440 424 408 392 376 360 344 328 312 296 280 264 248 232 216 200 184 168 152 136 120 104 88 72 56 40 24 8

Is the kernel build date an accurate reflection of the source code version it 
is being used with?  Could you let me know what file revisions are in use for 
uipc_socket.c, uipc_sockbuf2.c, uipc_syscalls.c, tcp_usrreq.c, tcp_input.c, 
tcp_subr.c?  Could you print *sb in frame #4, *so in frame #7, *tp in frame 
$5, and *inp in #5 (if defined) -- otherwise, (struct inpcb *)so->so_pcb, if 
non-NULL, in frame #6.

Robert N M Watson
Computer Laboratory
University of Cambridge

>
> #0  doadump () at pcpu.h:171
> 171     pcpu.h: No such file or directory.
>        in pcpu.h
> (kgdb) bt
> #0  doadump () at pcpu.h:171
> #1  0xffffffff802a01eb in boot (howto=260)
>    at /usr/src/sys/kern/kern_shutdown.c:409
> #2  0xffffffff802a08cc in panic (fmt=0xffffff03157e0d20 "")
>    at /usr/src/sys/kern/kern_shutdown.c:563
> #3  0xffffffff802f4d23 in sbflush_internal (sb=0xffffff031243ab68)
>    at /usr/src/sys/kern/uipc_sockbuf.c:808
> #4  0xffffffff802f50cb in sbflush (sb=0xffffff031243ab68)
>    at /usr/src/sys/kern/uipc_sockbuf.c:825
> #5  0xffffffff803b7246 in tcp_disconnect (tp=0xffffff03101f73e0)
>    at /usr/src/sys/netinet/tcp_usrreq.c:1496
> #6  0xffffffff803b7539 in tcp_usr_disconnect (so=0xffffff0311a04690)
>    at /usr/src/sys/netinet/tcp_usrreq.c:584
> #7  0xffffffff802f67f2 in soclose (so=0xffffff031243aae0)
>    at /usr/src/sys/kern/uipc_socket.c:642
> #8  0xffffffff802de133 in soo_close (fp=0xffffff0312402258, td=0x0)
>    at /usr/src/sys/kern/sys_socket.c:296
> #9  0xffffffff8027479f in fdrop (fp=0xffffff0312402258, td=0xffffff03157e0d20)
>    at file.h:297
> #10 0xffffffff80274aaf in closef (fp=0xffffff0312402258, td=0xffffff03157e0d20)
>    at /usr/src/sys/kern/kern_descrip.c:1928
> #11 0xffffffff80275f54 in fdfree (td=0xffffff03157e0d20)
>    at /usr/src/sys/kern/kern_descrip.c:1638
> #12 0xffffffff8027f537 in exit1 (td=0xffffff03157e0d20, rv=9)
>    at /usr/src/sys/kern/kern_exit.c:271
> #13 0xffffffff802a578f in sigexit (td=0xffffff03157e0d20, sig=9)
>    at /usr/src/sys/kern/kern_sig.c:2862
> #14 0xffffffff802a63ac in postsig (sig=9) at /usr/src/sys/kern/kern_sig.c:2741
> #15 0xffffffff802d3547 in ast (framep=0xffffffffb0580c70)
>    at /usr/src/sys/kern/subr_trap.c:271
> #16 0xffffffff804787f0 in Xfast_syscall ()
> ---Type <return> to continue, or q <return> to quit---
>    at /usr/src/sys/amd64/amd64/exception.S:283
> #17 0x00000003c0c7294c in ?? ()
> Previous frame inner to this frame (corrupt stack?)
> (kgdb) quit
>
> I have the debug kernel and vmcore file, and can make it available.
>
> The dmesg for the node that panic is
>
> Copyright (c) 1992-2007 The FreeBSD Project.
> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
> 	The Regents of the University of California. All rights reserved.
> FreeBSD is a registered trademark of The FreeBSD Foundation.
> FreeBSD 7.0-CURRENT #6: Fri May 18 10:19:43 PDT 2007
>    kargl_at_node10.cimu.org:/usr/obj/usr/src/sys/HPC
> ACPI APIC Table: <A M I  OEMAPIC >
> Timecounter "i8254" frequency 1193182 Hz quality 0
> CPU: Dual Core AMD Opteron(tm) Processor 280 (2391.55-MHz K8-class CPU)
>  Origin = "AuthenticAMD"  Id = 0x20f12  Stepping = 2
>  Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
>  Features2=0x1<SSE3>
>  AMD Features=0xe2500800<SYSCALL,NX,MMX+,FFXSR,LM,3DNow!+,3DNow!>
>  AMD Features2=0x3<LAHF,CMP>
>  Cores per package: 2
> usable memory = 16902705152 (16119 MB)
> avail memory  = 16387166208 (15628 MB)
> FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
> cpu0 (BSP): APIC ID:  0
> cpu1 (AP): APIC ID:  1
> cpu2 (AP): APIC ID:  2
> cpu3 (AP): APIC ID:  3
> MADT: Forcing active-low polarity and level trigger for SCI
> ioapic0 <Version 1.1> irqs 0-23 on motherboard
> ioapic1 <Version 1.1> irqs 24-27 on motherboard
> ioapic2 <Version 1.1> irqs 28-31 on motherboard
> acpi0: <A M I OEMXSDT> on motherboard
> acpi0: [ITHREAD]
> acpi_hpet0: <High Precision Event Timer> iomem 0xfec01000-0xfec013ff on acpi0
> Timecounter "HPET" frequency 14318180 Hz quality 2000
> acpi0: Power Button (fixed)
> acpi0: reservation of 0, a0000 (3) failed
> acpi0: reservation of 100000, eff00000 (3) failed
> Timecounter "ACPI-safe" frequency 3579545 Hz quality 1000
> acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0
> cpu0: <ACPI CPU> on acpi0
> acpi_throttle0: <ACPI CPU Throttling> on cpu0
> cpu1: <ACPI CPU> on acpi0
> cpu2: <ACPI CPU> on acpi0
> cpu3: <ACPI CPU> on acpi0
> pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
> pci0: <ACPI PCI bus> on pcib0
> pcib1: <ACPI PCI-PCI bridge> at device 6.0 on pci0
> pci3: <ACPI PCI bus> on pcib1
> ohci0: <OHCI (generic) USB controller> mem 0xfeafc000-0xfeafcfff irq 19 at device 0.0 on pci3
> ohci0: [GIANT-LOCKED]
> ohci0: [ITHREAD]
> usb0: OHCI version 1.0, legacy support
> usb0: SMM does not respond, resetting
> usb0: <OHCI (generic) USB controller> on ohci0
> usb0: USB revision 1.0
> uhub0: <AMD OHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb0
> device_attach: uhub0 attach returned 6
> usb0: port 0, set config at addr 1 failed
> usb0: root hub problem, error=4
> ohci1: <OHCI (generic) USB controller> mem 0xfeafd000-0xfeafdfff irq 19 at device 0.1 on pci3
> ohci1: [GIANT-LOCKED]
> ohci1: [ITHREAD]
> usb1: OHCI version 1.0, legacy support
> usb1: SMM does not respond, resetting
> usb1: <OHCI (generic) USB controller> on ohci1
> usb1: USB revision 1.0
> uhub1: <AMD OHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb1
> uhub1: 3 ports with 3 removable, self powered
> atapci0: <SiI 3114 SATA150 controller> port 0xbc00-0xbc07,0xb400-0xb403,0xb000-0xb007,0xac00-0xac03,0xa800-0xa80f mem 0xfeafec00-0xfeafefff irq 17 at device 5.0 on pci3
> atapci0: [ITHREAD]
> ata2: <ATA channel 0> on atapci0
> ata2: [ITHREAD]
> ata3: <ATA channel 1> on atapci0
> ata3: [ITHREAD]
> ata4: <ATA channel 2> on atapci0
> ata4: [ITHREAD]
> ata5: <ATA channel 3> on atapci0
> ata5: [ITHREAD]
> vgapci0: <VGA-compatible display> port 0xb800-0xb8ff mem 0xfd000000-0xfdffffff,0xfeaff000-0xfeafffff irq 18 at device 6.0 on pci3
> isab0: <PCI-ISA bridge> at device 7.0 on pci0
> isa0: <ISA bus> on isab0
> atapci1: <AMD 8111 UDMA133 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 7.1 on pci0
> ata0: <ATA channel 0> on atapci1
> ata0: [ITHREAD]
> ata1: <ATA channel 1> on atapci1
> ata1: [ITHREAD]
> amdsmb0: <AMD-8111 SMBus 2.0 Controller> port 0xcc00-0xcc1f irq 19 at device 7.2 on pci0
> smbus0: <System Management Bus> on amdsmb0
> smb0: <SMBus generic I/O> on smbus0
> amdpm0: <AMD 756/766/768/8111 Power Management Controller> port 0x10e0-0x10ff at device 7.3 on pci0
> smbus1: <System Management Bus> on amdpm0
> smb1: <SMBus generic I/O> on smbus1
> pcib2: <ACPI PCI-PCI bridge> at device 10.0 on pci0
> pci2: <ACPI PCI bus> on pcib2
> pci2:9:0: bad VPD cksum, remain 72
> bge0: <Broadcom Gigabit Ethernet Controller, ASIC rev. 0x2003> mem 0xfc8c0000-0xfc8cffff,0xfc8b0000-0xfc8bffff irq 24 at device 9.0 on pci2
> miibus0: <MII bus> on bge0
> brgphy0: <BCM5704 10/100/1000baseTX PHY> PHY 1 on miibus0
> brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
> bge0: Ethernet address: 00:e0:81:34:e1:4c
> bge0: [ITHREAD]
> pci2:9:1: bad VPD cksum, remain 72
> bge1: <Broadcom Gigabit Ethernet Controller, ASIC rev. 0x2003> mem 0xfc8f0000-0xfc8fffff,0xfc8e0000-0xfc8effff irq 25 at device 9.1 on pci2
> miibus1: <MII bus> on bge1
> brgphy1: <BCM5704 10/100/1000baseTX PHY> PHY 1 on miibus1
> brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
> bge1: Ethernet address: 00:e0:81:34:e1:4d
> bge1: [ITHREAD]
> pcib3: <ACPI PCI-PCI bridge> at device 11.0 on pci0
> pci1: <ACPI PCI bus> on pcib3
> acpi_button0: <Power Button> on acpi0
> atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
> atkbd0: <AT Keyboard> irq 1 on atkbdc0
> kbd0 at atkbd0
> atkbd0: [GIANT-LOCKED]
> atkbd0: [ITHREAD]
> sio0: configured irq 4 not in bitmap of probed irqs 0
> sio0: port may not be enabled
> sio0: configured irq 4 not in bitmap of probed irqs 0
> sio0: port may not be enabled
> sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
> sio0: type 16550A
> sio0: [FILTER]
> sio1: configured irq 3 not in bitmap of probed irqs 0
> sio1: port may not be enabled
> sio1: configured irq 3 not in bitmap of probed irqs 0
> sio1: port may not be enabled
> sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0
> sio1: type 16550A
> sio1: [FILTER]
> fdc0: <floppy drive controller (FDE)> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0
> fdc0: does not respond
> device_attach: fdc0 attach returned 6
> ppc0: <Parallel port> port 0x378-0x37f irq 7 on acpi0
> ppc0: Generic chipset (NIBBLE-only) in COMPATIBLE mode
> ppbus0: <Parallel port bus> on ppc0
> lpt0: <Printer> on ppbus0
> lpt0: Interrupt-driven port
> ppi0: <Parallel I/O> on ppbus0
> ppc0: [GIANT-LOCKED]
> ppc0: [ITHREAD]
> fdc0: <floppy drive controller (FDE)> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0
> fdc0: does not respond
> device_attach: fdc0 attach returned 6
> orm0: <ISA Option ROMs> at iomem 0xc0000-0xc7fff,0xc8000-0xcc7ff,0xcc800-0xcdfff,0xce000-0xcf7ff,0xcf800-0xd07ff on isa0
> sc0: <System console> at flags 0x100 on isa0
> sc0: VGA <8 virtual consoles, flags=0x300>
> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
> Timecounters tick every 1.000 msec
> ad4: 239372MB <WDC WD2500YD-01NVB1 10.02E01> at ata2-master SATA150
> SMP: AP CPU #1 Launched!
> SMP: AP CPU #2 Launched!
> SMP: AP CPU #3 Launched!
> hwpmc: TSC/1/0x20<REA> K8/4/0x1ff<INT,USR,SYS,EDG,THR,REA,WRI,INV,QUA>
> Trying to mount root from ufs:/dev/ad4s1a
> WARNING: / was not properly dismounted
>
> -- 
> Steve
> http://troutmask.apl.washington.edu/~kargl/
> _______________________________________________
> freebsd-current_at_freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"
>
Received on Tue May 22 2007 - 10:21:16 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:10 UTC