[A normal multi-user boot's fsck activity can do fsck -B activity that gets the problem.] On 2017-Jul-8, at 9:45 AM, Mark Millard <markmi at dsl-only.net> wrote: > [I add notes about a problem that happens after the > "fsck -B". Also forgot to mention: production style > kernel world builds were in use. And a tried a > powerpc64 build and it works the same.] > > On 2017-Jul-7, at 11:09 PM, Mark Millard <markmi at dsl-only.net> wrote: > >> [This note has more information than one sent with extra text >> in the subject but with a partially different "to" list.] >> >> Peter Jeremy peter at rulingia.com wrote on >> Sat Jul 8 02:00:47 UTC 2017 : >> >>> When did you first notice this (what SVN revision)? >>> Do you know what the last good SVN revision was? >>> Is this a new or old filesystem? >>> Is the filesystem mounted/active or not when you dump it? >>> What are the relevant parameters for the filesystem on ada0s3a? >>> Are you running softupdates, journalling etc? >>> Which dump(8) phase is reporting the errors? >>> What are the exact dump and fsck commands you ran? >> >> I can add a little information with some contrast >> and only "fsck -B" in use (with an unclean file >> system from a prior crash), no dump use. Still: >> a snapshot is involved in the below. >> >> Unfortunately two problems with major consequences >> for my involved context limit the svn range that I >> can cover for the activity, the problem version >> ranges being: >> >> -r319722 through -r320651 (fixed by -r320652) >> (actually this is why I had used "boot -s" >> in what I report later: I could get to a >> shell prompt that way instead of crashing >> before any login prompt; the crashes left >> the file system in need of repair) >> >> -r320509 through -r320561 (fixed by -r320570) >> >> So I was using -r320570 to avoid one of the >> two problems. >> >> >> >> Context: 32-bit powerpc FreeBSD used on PowerMac G5 >> so-called "Quad-core". (So big-endian as well.) >> Softupdates, no journalling. Long-in-use file >> system having lots of FreeBSD versions updates >> and port rebuilds over the time. >> >> The following is from now, not from the time of the >> example messages: >> >> # dumpfs / | more >> magic 19540119 (UFS2) time Fri Jul 7 22:53:34 2017 >> superblock location 65536 id [ <OMITTED> ] >> ncg 158 size 25165823 blocks 24372006 >> bsize 32768 shift 15 mask 0xffff8000 >> fsize 4096 shift 12 mask 0xfffff000 >> frag 8 shift 3 fsbtodb 3 >> minfree 8% optim time symlinklen 120 >> maxbsize 32768 maxbpg 4096 maxcontig 4 contigsumsize 4 >> nbfree 2130375 ndir 65518 nifree 11769796 nffree 425065 >> bpg 20032 fpg 160256 ipg 80128 unrefs 0 >> nindir 4096 inopb 128 maxfilesize 2252349704110079 >> sbsize 4096 cgsize 32768 csaddr 5048 cssize 4096 >> sblkno 24 cblkno 32 iblkno 40 dblkno 5048 >> cgrotor 127 fmod 0 ronly 0 clean 0 >> metaspace 6408 avgfpdir 64 avgfilesize 16384 >> flags soft-updates trim >> fsmnt / >> volname FBSDG4Srootfs swuid 0 providersize 25165823 >> . . . >> >> >> >> What I had done that produced the messages was: >> >> <Prior failed multi-user boot from system problem >> leaves root (only) file system not marked clean >> so fsck -B will actually do something below> >> >> boot -s (so: single user mode) >> # The next 3 lines are the content of a generic, manually-run script. >> mount -u / >> mount -a -t ufs (but there is no other file system) >> swapon -a (there is a swap partition) >> # >> fsck -B >> >> That "fsck -B" caused the same kinds of lines >> reported by Michael Butler, happening as fsck >> makes a snapshot for the background processing >> to use. (I have camera pictures and could type >> in some of the lines if needed.) >> >> After those lines was text like (typed in from >> an example camera picture): >> >> ** //.snap/fsck_snapshot >> ** Last Mount on / >> ** Root file system >> ** Phase 1 - Check Blocks and Sizes >> ** Phase 2 - Check Pathnames >> ** Phase 3 - Check Connectivity >> ** Phase 4 - Check Reference Counts >> ** Phase 5 - Check Cyl groups >> Reclaimed: 0 directories, 1 files, 22680 fragments >> 780914 files, 4797127 used, 19552199 free (443479 frags, 3288590 blocks, 1.8% fragmentation) >> >> ***** FILE SYSTEM MARKED CLEAN ***** > > [I forgot or mention that the context was a > production style kernel and world build, > no invariants or other such.] > > Since I'm running a patched -r320570 for the > issue: > > -r319722 through -r320651 (fixed by -r320652) > > I went back and forced a power-off without > shutdown and did the sequence: > > boot -s (so: single user mode) > # The next 3 lines are the content of a generic, manually-run script. > mount -u / > mount -a -t ufs (but there is no other file system) > swapon -a (there is a swap partition) > # > fsck -B > > but always waited briefly after the fsck -B finished. > > Like before the following happens as it tries to trim: > (typed in from camera picture) > > panic: ffs_blkfree_cq: freeing free block > cpuid = 2 (varies, of course) > time = (varies) > KDB: stack backtrace > (stack addresses can vary: just an example here) > 0xd23b17e0: at kdb_backtrace+0x5c > 0xd23b1850: at vpanic+0x1e8 > 0xd23b18c0: at panic+0x54 > 0xd23b1910: at ffs_blkfree_cq+0x278 > 0xd23b1980: at ffs_blkfree_trim_task+0x60 > 0xd23b19b0: at taskqueue_run_locked+0x10 > 0xd23b1a10: at taskqueue_thread_loop+0x174 > 0xd23b1a50: at fork_exit+0xf4 > 0xd23b1a80: at fork_trampoline+0xc > KDB: enter: panic > [ thread pid 0 tid 1000082 ] > Stopped at kdb_enter_0x70: addi r0,r0,0x0 > > > I've tried this on a powerpc64 and it works > the same, complete with the "freeing free > block" issue. I tried a sequence using a normal boot to multi-user that was not clean but did a automatic fsck -B and I got the messages and the later "freeing free block" crash. It appears that having mksnap_ffs (and code equivalents in other programs) broken in turn breaks fsck -B fairly majorly. (Michael Butler did the mksnap_ffs test at Rodney W. Grimes request.) I've been using the following to clean things up when I'm done with an experimental sequence that leaves things needing a fsck: boot -s (a single user boot) fsck -F So far it has resulted in a clean file system. With that status fsck -B then has no such problem: apparently it then does not create a snaphot by default. So then a multi-user boot works okay for its automatic fsck use. === Mark Millard markmi at dsl-only.netReceived on Sat Jul 08 2017 - 15:45:44 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:12 UTC