Re: Panic String: ffs_alloccg: map corrupted [/dev/gpt/tmp]

From: Benjamin Kaduk <kaduk_at_MIT.EDU>
Date: Wed, 11 Jun 2014 14:52:28 -0400 (EDT)
It is rather difficult to determine what sort of response you are 
expecting to this message, as it seems to cover several different (but 
maybe related) topics, and include some exposition and supposition that do 
not include clear questions.

On Wed, 11 Jun 2014, O. Hartmann wrote:

> Running FreeBSD
>
> Version String: FreeBSD 11.0-CURRENT #3 r267294: Mon Jun  9 22:07:15 CEST 2014 amd64
>
> crashes wihout panic message and /var/crash/info.0 contains this message:
>
> Dump header from device /dev/gpt/swap
>  Architecture: amd64
>  Architecture Version: 2
>  Dump Length: 968962048B (924 MB)
>  Blocksize: 512
>  Dumptime: Wed Jun 11 19:19:19 2014
>  Hostname: thor.sb211.zbv
>  Magic: FreeBSD Kernel Dump
>  Version String: FreeBSD 11.0-CURRENT #3 r267294: Mon Jun  9 22:07:15 CEST 2014
>    root_at_thor.sb211.zbv:/usr/obj/usr/src/sys/THOR
>  Panic String: ffs_alloccg: map corrupted
>  Dump Parity: 3034136388
>  Bounds: 0
>  Dump Status: good
>
> I'm very confused about the panic string, since it seems to tell me something is bad with
> FFS/UFS.

ffs is encountering "bad" data while searching through the free block map. 
I am not an ffs/ufs expert, but I think this could be the result of of 
corrupt data on-disk [from a previous crash?] that does not get cleaned up 
by fsck.  If that is the case, re-running newfs should clear things up. 
Since this is /tmp which is, as you note, usually just ephemeral files, 
that is probably one of the first things I would try.


> More disturbing is the fact that the boot process into multi user stops at a compalin
> about unclean /dev/gpt/tmp filesystem (mount to /tmp): The OS stops at the PAsswd: prompt
> for single user-mode maintainance.

If error(s) are encountered during the mounting of filesystems, the OS 
always drops to single-user mode.  There is no special-casing for /tmp or 
anything else.  See the calls to stop_boot() from 
/etc/rc.d/mountcritlocal, etc..

> I can not understand why the system is stopping complaining about a broken /tmp
> filesystem. I consider especially /tmp infill corrupt after a fault and I'd like to ask
> whether there is a way to overrun this corruption and force a repair and mount, even if
> the data contained in /tmp is after forced cleaning corrupt.
>
> When using tmpfs backed /tmp there shouldn't be any stopp/fault of that kind so it would
> be canonical to have it also for a hard-drive backed /tmp, or am I wrong?

I don't think you're obviously correct.  You may not be wrong, but this is 
not how the system is currently expected to behave; there would need to be 
some discussion if it was to change.

> It is not the first time that I receive this kind of crash under heavy load (box is a
> 8GB system with this CPU specs:
>
> FreeBSD clang version 3.4.1 (tags/RELEASE_34/dot1-final
> 208032) 20140512 CPU: Intel(R) Core(TM)2 Duo CPU     E8400  _at_ 3.00GHz (2999.72-MHz
> K8-class CPU) Origin="GenuineIntel"  Id=0x10676  Family=0x6  Model=0x17  Stepping=6
>  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
>  Features2=0x8e3fd<SSE3,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1>
>  AMD Features=0x20100800<SYSCALL,NX,LM>
>  AMD Features2=0x1<LAHF>
>  TSC: P-state invariant, performance statistics
> real memory  = 8589934592 (8192 MB)
> avail memory = 8278880256 (7895 MB)
> Event timer "LAPIC" quality 400
> ACPI APIC Table: <A_M_I_ OEMAPIC >
> FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
> FreeBSD/SMP: 1 package(s) x 2 core(s)
> cpu0 (BSP): APIC ID:  0
> cpu1 (AP): APIC ID:  1
> [...]
>
> The not-so-funny-part is that I have those crashes under heavy load very frequent on ALL
> C2D systems (one E8400 as shown, another has a Q4400 CPU, but also 8 GB RAM, same
> motherboard). In all cases of a sudden crash, /tmp gets corrupted and the system refuses
> to boot into multiuser mode complaining about the broken /tmp filesystem which can not be
> repaired automatically.
>
> Apart from this specific question about an unclean /tmp, this kind of crash under heavy
> load on a specific hardware architecture with most recent CURRENT is puzzling (and
> occured within the past 8 weeks several times with the same stupid blocking at the
> broken /tmp partition). I also checked the hardware with tools like memtest86 ensure
> having no fault memory, but I can not exclude some kind of overheating the CPU since I
> realized with CLANG and -O3 (which is supposed to optimise for vector units if available,
> if I'm right) this increases the average CPU temperature by ~ 3 - 5 degree Celsius. This
> is more obvious on a Dell Latitude E6510 with a first-generation Sandy Bridge mobile CPU
> and FreeBSD 9.2/9.3: compiling the OS with gcc 4.2 (base compiler in that system), the
> temperature is 2 - 4 degrees lower than using CLANG 3.4.1 with -O3 enabled (reading the
> ACPI reported temperature via "systctl -a|grep tempe"). This is funny, isn't it?

I don't feel like there is anything I can say in reply to this bit.

-Ben
Received on Wed Jun 11 2014 - 16:52:40 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:49 UTC