Re: kernel: failed: cg 5, cgp: 0xd11ecd0d != bp: 0x63d3ff1d

From: O. Hartmann <ohartmann_at_walstatt.org>
Date: Thu, 22 Feb 2018 08:37:07 +0100
On Tue, 20 Feb 2018 12:39:53 +0100
Gary Jennejohn <gljennjohn_at_gmail.com> wrote:

> On Mon, 19 Feb 2018 14:18:15 -0800
> "Chris H" <bsd-lists_at_BSDforge.com> wrote:
> 
> > I'm seeing a number of messages like the following:
> > kernel: failed: cg 5, cgp: 0xd11ecd0d != bp: 0x63d3ff1d
> > 
> > and was wondering if it's anything to be concerned with, or whether
> > fsck(8) is fixing them.
> > This began to happen when the power went out on a new install:
> > FreeBSD dns0 12.0-CURRENT FreeBSD 12.0-CURRENT #0: Wed Dec 13 06:07:59 PST
> > 2017 root_at_dns0:/usr/obj/usr/src/amd64.amd64/sys/DNS0 amd64
> > which hadn't yet been hooked up to the UPS.
> > I performed an fsck in single user mode upon power-up. Which ended with the
> > mount points being masked CLEAN. I was asked if I wanted to use the JOURNAL.
> > I answered Y.
> > FWIW the systems are UFS2 (ffs) have gpart labels, and were newfs'd thusly:
> > newfs -U -j
> > 
> > Thank you for all your time, and consideration.
> >   
> 
> fsck fixes these errors only when the user does NOT use the journal.
> You should re-do the fsck.
> 

When first these mysterious errors occured on several boxes running CURRENT,
that was in December 2017 if I'm right, I also whitnessed mysterious and
frequent crashes on several SSD driven machines, where this error described
above occured.

While the error vanished somehow in the meanwhile while CURRENT proceeds, the
crashes continued - on two boxes, I dumped restore the OS on the system's SSD
by reformatting the SSD from sratch (UFS2, soft update+ journaling). On those
boxes the mysterious crashes vanished since then!

On box left so far, my workstation. And this box continous to crash now and
started crashing today again while compiling world/kernel.

The fun-part is: even after a clean shutdown, where I can not detect any
filesystem inconsistencies and rebooting and, again: no reported
inconsistencies on the console/messages/logs, the box crashes spontanously. Now
(today) I could trigger the reboot by starting "make -j4 buildworld
buildkernel" and after showing the initial compiler statements/build framework
statements, the box went to Nirwana. A well known phenomenon right now.

I checked now the consistency of the filesystem, here is the result of
the /usr/obj tree, which is a dedicated GPT partition
(label: /dev/gpt/usr.obj):


[...]
 root_at_box1:~ # fsck -fy /dev/gpt/usr.obj
** /dev/gpt/usr.obj
** Last Mounted on /usr/obj
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
UNALLOCATED  I=515  OWNER=root MODE=0
SIZE=0 MTIME=Feb 22 07:25 2018 
NAME=/usr/src/amd64.amd64/sys/BOX1/config.c.new

UNEXPECTED SOFT UPDATE INCONSISTENCY

REMOVE? yes

DIRECTORY CORRUPTED  I=169691  OWNER=root MODE=40775
SIZE=1536 MTIME=Feb 22 05:16 2018 
DIR=/usr/src/amd64.amd64/sys/BOX1/modules/usr/src/sys/modules/nfsd

UNEXPECTED SOFT UPDATE INCONSISTENCY

SALVAGE? yes

** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
FREE BLK COUNT(S) WRONG IN SUPERBLK
SALVAGE? yes

SUMMARY INFORMATION BAD
SALVAGE? yes

BLK(S) MISSING IN BIT MAPS
SALVAGE? yes

126922 files, 848197 used, 1178482 free (89210 frags, 136159 blocks, 4.4%
fragmentation)

***** FILE SYSTEM MARKED DIRTY *****

***** FILE SYSTEM WAS MODIFIED *****

***** PLEASE RERUN FSCK *****

[...]

When doing a installworld, I pre-emptively perform in single user mode before
mounting the partitions a "fsck -yf" two times. In most cases, the filesystem
are reported clean, but sometimes especially those under high I/O (/usr/src and
mostly /usr/obj on this build machine) there are reports of corruption.

As I reported, the very same behaviour occured on three boxes simultanously and
I got rid of it by completely reformatting the SSDs (never had issues so far
with HDD based boxes!). 

I hope I can refurbish this weekend the remaining box and I could report, if
desired, whether this box returns to a healthy state as the others or if my
observation was a simple coincidence of issues ...

Thanks for the patience,

Oliver
Received on Thu Feb 22 2018 - 06:37:38 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:15 UTC