Panic: snapacct_ufs2: bad block

From: Khetan Gajjar <khetan_at_os.org.za>
Date: Mon, 15 Aug 2005 17:09:34 +0200 (SAST)
Hi.

I'm seeing several snapshot-related crashes in -current, cvsup'd
08/12/2005 at 15:15 GMT+0200.  I suspect a ule scheduler/snapshot
interaction.

/var/crash/info.1 reveals
Dump header from device /dev/ad0s1b
   Architecture: i386
   Architecture Version: 33554432
   Dump Length: 528023552B (503 MB)
   Blocksize: 512
   Dumptime: Mon Aug 15 12:32:00 2005
   Hostname: citadel.os.org.za
   Magic: FreeBSD Kernel Dump
   Version String: FreeBSD 7.0-CURRENT #0: Fri Aug 12 22:44:36 SAST 2005
     khetan_at_citadel.os.org.za:/usr/src/sys/i386/compile/CITADEL5
   Panic String: snapacct_ufs2: bad block
   Dump Parity: 1551260746
   Bounds: 1
   Dump Status: good

Kgdb reveals
[citadel] /var/crash# kgdb -c vmcore.1
/usr/src/sys/i386/compile/CITADEL5/kernel.debug
[GDB will not be able to debug user-mode threads:
/usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for
details.
This GDB was configured as "i386-marcel-freebsd".

Unread portion of the kernel message buffer:
ÀÍÁ_at_
ÁÄ
Á¢ÁÀÍÁ
ÁÁ ¢ÁÀÍÁÀ
ÁDÁ0¢ÁÀÍÁÁ
           Á_at_¢Á

#0  doadump () at pcpu.h:165
165     pcpu.h: No such file or directory.
         in pcpu.h
(kgdb) backtrace
#1  0xc050212c in boot (howto=260) at ../../../kern/kern_shutdown.c:397
#2  0xc0502481 in panic (fmt=0xc06bb00a "snapacct_ufs2: bad block")
     at ../../../kern/kern_shutdown.c:553
#3  0xc05f9d95 in snapacct_ufs2 (vp=0xc2720880, oldblkp=0xc2673dd0,
     lastblkp=0xc2676000, fs=0xc1a75800, lblkno=12, expungetype=2)
     at ../../../ufs/ffs/ffs_snapshot.c:1338
#4  0xc05f9b3b in indiracct_ufs2 (snapvp=0xc2720880, cancelvp=0xc1ca9990,
     level=0, blkno=Unhandled dwarf expression opcode 0x93
) at ../../../ufs/ffs/ffs_snapshot.c:1253
#5  0xc05f9905 in expunge_ufs2 (snapvp=0xc2720880, cancelip=0xc1c58bdc,
     fs=0xc1a75800, acctfunc=0xc05f9c7c <snapacct_ufs2>, expungetype=2)
     at ../../../ufs/ffs/ffs_snapshot.c:1185
#6  0xc05f7eaa in ffs_snapshot (mp=0xc1c05c00, snapfile=0xc1c58ce4
"`\214ÅÁ")
     at ../../../ufs/ffs/ffs_snapshot.c:605
#7  0xc0605de1 in ffs_mount (mp=0xc1c05c00, td=0xc24bb000)
     at ../../../ufs/ffs/ffs_vfsops.c:302
#8  0xc05556fc in vfs_domount (td=0xc24bb000, fstype=0xc1cb01f0 "ufs",
     fspath=0xc1cb0a00 "/", fsflags=16842752, fsdata=0xc2f23710)
     at ../../../kern/vfs_mount.c:739
#9  0xc0554ee9 in vfs_donmount (td=0xc24bb000, fsflags=16842752,
     fsoptions=0xd7041c04) at ../../../kern/vfs_mount.c:503
#10 0xc0557444 in kernel_mount (ma=0xc2311330, flags=16842752) at
pcpu.h:162
#11 0xc0606041 in ffs_cmount (ma=0xc2311330, data=0x0, flags=16842752,
---Type <return> to continue, or q <return> to quit---
     td=0xc24bb000) at ../../../ufs/ffs/ffs_vfsops.c:384
#12 0xc05550c6 in mount (td=0xc24bb000, uap=0xd7041d04)
     at ../../../kern/vfs_mount.c:566
#13 0xc066f0db in syscall (frame=
       {tf_fs = 59, tf_es = 59, tf_ds = 59, tf_edi = 134523985, tf_esi =
-1077941244, tf_ebp = -1077943848, tf_isp = -687596188, tf_ebx =
-1077943792, tf_edx = -1, tf_ecx = -1077940433, tf_eax = 21, tf_trapno =
12, tf_err = 2, tf_eip = 671848243, tf_cs = 51, tf_eflags = 582, tf_esp =
-1077944004, tf_ss = 59})
     at ../../../i386/i386/trap.c:986
#14 0xc065bb0f in Xint0x80_syscall () at
../../../i386/i386/exception.s:200
#15 0x0000003b in ?? ()
#16 0x0000003b in ?? ()
#17 0x0000003b in ?? ()
#18 0x0804ac51 in ?? ()
#19 0xbfbfec04 in ?? ()
#20 0xbfbfe1d8 in ?? ()
#21 0xd7041d64 in ?? ()
#22 0xbfbfe210 in ?? ()
#23 0xffffffff in ?? ()
#24 0xbfbfef2f in ?? ()
#25 0x00000015 in ?? ()
#26 0x0000000c in ?? ()
#27 0x00000002 in ?? ()
---Type <return> to continue, or q <return> to quit---
#26 0x0000000c in ?? ()
#27 0x00000002 in ?? ()
---Type <return> to continue, or q <return> to quit---
#28 0x280b9733 in ?? ()
#29 0x00000033 in ?? ()
#30 0x00000246 in ?? ()
#31 0xbfbfe13c in ?? ()
#32 0x0000003b in ?? ()
#33 0x00000000 in ?? ()
#34 0x00000000 in ?? ()
#35 0x00000000 in ?? ()
#36 0x00000000 in ?? ()
#37 0x12471000 in ?? ()
#38 0xc24bb154 in ?? ()
#39 0xc19b27d0 in ?? ()
#40 0xd7041504 in ?? ()
#41 0xd70414e8 in ?? ()
#42 0xc24bb000 in ?? ()
#43 0xc0514827 in sched_switch (td=0xbfbfe210, newtd=0xbfbfec04,
flags=Cannot access memory at address 0xbfbfe1e8
)
     at ../../../kern/sched_ule.c:1387
Previous frame inner to this frame (corrupt stack?)

This points to a ULE scheduler issue, right ?

My dmesg shows
Copyright (c) 1992-2005 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
         The Regents of the University of California. All rights reserved.
FreeBSD 7.0-CURRENT #0: Fri Aug 12 22:44:36 SAST 2005
     khetan_at_citadel.os.org.za:/usr/src/sys/i386/compile/CITADEL5
WARNING: debug.mpsafenet forced to 0 as ipsec requires Giant
WARNING: MPSAFE network stack disabled, expect reduced performance.
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Celeron(R) CPU 2.00GHz (1999.95-MHz 686-class CPU)
   Origin = "GenuineIntel"  Id = 0xf29  Stepping = 9

Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,M
CA,C
MOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
   Features2=0x4400<CNTX-ID,<b14>>
real memory  = 528416768 (503 MB)
avail memory = 507617280 (484 MB)
ACPI APIC Table: <P4M266 AWRDACPI>
ioapic0 <Version 0.3> irqs 0-23 on motherboard
npx0: [FAST]
npx0: <math processor> on motherboard
npx0: INT 16 interface
acpi0: <P4M266 AWRDACPI> on motherboard
acpi0: Power Button (fixed)
pci_link0: <ACPI PCI Link LNKA> on acpi0
pci_link1: <ACPI PCI Link LNKB> on acpi0
pci_link2: <ACPI PCI Link LNKC> irq 11 on acpi0
pci_link3: <ACPI PCI Link LNKD> on acpi0
pci_link4: <ACPI PCI Link ALKA> irq 0 on acpi0
pci_link5: <ACPI PCI Link ALKB> irq 0 on acpi0
pci_link6: <ACPI PCI Link ALKC> irq 0 on acpi0
pci_link7: <ACPI PCI Link ALKD> irq 0 on acpi0
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0
cpu0: <ACPI CPU> on acpi0
acpi_button0: <Power Button> on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
agp0: <VIA 8703 (P4M266x/P4N266) host to PCI bridge> mem
0xeb000000-0xeb7fffff a
t device 0.0 on pci0
pcib1: <PCI-PCI bridge> at device 1.0 on pci0
pci1: <PCI bus> on pcib1
pci1: <display, VGA> at device 0.0 (no driver attached)
fxp0: <Intel 82550 Pro/100 Ethernet> port 0xd000-0xd03f mem
0xeb820000-0xeb820ff
f,0xeb800000-0xeb81ffff irq 18 at device 8.0 on pci0
miibus0: <MII bus> on fxp0
inphy0: <i82555 10/100 media interface> on miibus0
inphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
fxp0: Ethernet address: 00:02:b3:ed:ec:a2
fxp0: [GIANT-LOCKED]
isab0: <PCI-ISA bridge> at device 17.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <VIA 8235 UDMA133 controller> port
0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,
0xe000-0xe00f at device 17.1 on pci0
ata0: <ATA channel 0> on atapci0
ata1: <ATA channel 1> on atapci0
acpi_tz0: <Thermal Zone> on acpi0
fdc0: <floppy drive controller> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on
acpi0
fdc0: [FAST]
sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on
acpi0
sio0: type 16550A
ppc0: <Standard parallel printer port> port 0x378-0x37f irq 7 on acpi0
ppc0: Generic chipset (NIBBLE-only) in COMPATIBLE mode
ppbus0: <Parallel port bus> on ppc0
plip0: <PLIP network interface> on ppbus0
lpt0: <Printer> on ppbus0
lpt0: Interrupt-driven port
ppi0: <Parallel I/O> on ppbus0
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
pmtimer0 on isa0
orm0: <ISA Option ROM> at iomem 0xcc000-0xcd7ff pnpid ORM0000 on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
sio1: configured irq 3 not in bitmap of probed irqs 0
sio1: port may not be enabled
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
Timecounter "TSC" frequency 1999954984 Hz quality 800
Timecounters tick every 1.000 msec
IPsec: Initialized Security Association Processing.
ipfw2 (+ipv6) initialized, divert loadable, rule-based forwarding
disabled, defa
ult to deny, logging unlimited
ad0: 39266MB <HDS722540VLAT20 V31OA6EA> at ata0-master UDMA100
ad2: 39266MB <HDS722540VLAT20 V31OA6EA> at ata1-master UDMA100
Trying to mount root from ufs:/dev/ad0s1a
fxp0: Microcode loaded, int_delay: 1000 usec  bundle_max: 6
fxp0: Microcode loaded, int_delay: 1000 usec  bundle_max: 6
fxp0: Microcode loaded, int_delay: 1000 usec  bundle_max: 6
fxp0: Microcode loaded, int_delay: 1000 usec  bundle_max: 6
fxp0: Microcode loaded, int_delay: 1000 usec  bundle_max: 6
fxp0: Microcode loaded, int_delay: 1000 usec  bundle_max: 6
Accounting enabled

I'd appreciate any pointers! Thanks.

PS

Problem is the machine is hosted in a remote data centre, requiring manual
intervention to re-fsck it every time this crash occurs. For now, I'd
disabled snapshots and forced
fsck_y_enable="YES"
background_fsck="NO"
in /etc/rc.conf in the vain hope that if the machine barfs, it'll pick
itself up again. That is logical, yes ?

Khetan Gajjar
--
Services           | +27 11 575 3832
Internet Solutions | http://www.is.co.za/
Received on Mon Aug 15 2005 - 13:09:48 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:41 UTC