Re: vinum error: statfs related?

From: Robert Watson <rwatson_at_FreeBSD.ORG>
Date: Mon, 17 Nov 2003 17:11:40 -0500 (EST)
On Fri, 17 Jan 2003, Eric Anholt wrote:

> I'm getting the same (no drives/subdisks/plexes/volumes found) trying to
> upgrade from a Nov 11 kernel/userland to Nov 16th kernel.  I tried
> seeing if using a Nov 16th vinum binary would load them, but after doing
> a stop/start, the system paniced, and it seems my swap is too small to
> dump on.  Kernel was built using configure MYKERNEL; cd
> ../compile/MYKERNEL; make depend all install instead of buildkernel. DDB
> enabled but no invariants/witness, not sure what else from my config
> might be applicable. 

I'm able to trigger this warning simply by starting and stopping Vinum
without a Vinum configuration:

ttyp0:
  crash2# vinum start
  ** no drives found: No such file or directory
  crash2# vinum stop
  vinum unloaded

console:
  vinum: loaded
  vinum: no drives found
  vinum: exiting with malloc table inconsistency at 0xc2053c00 from
  vinumio.c:755
  vinum: unloaded

I attempted to experiment some with Vinum today.  After fixing a bug in
the vinum user tool to stop trying to create device nodes and directories
in devfs, it seemed to come up OK (fix committed).  I documented the bug
that vinum won't work with storage devices with sector sizes other than
DEV_BSIZE (512) in the vinum.8 man page, since I don't have time to fix it
today.  I created a malloc md-backed vinum array with seeming ease, but
was unable to newfs the result: 

ttyp0:
  crash2# mdconfig -a -t malloc -s 1m
  md0
  crash2# mdconfig -a -t malloc -s 1m
  md1
  crash2# mdconfig -a -t malloc -s 1m
  md2
  crash2# vinum
  vinum -> concat /dev/md0 /dev/md1 /dev/md2
  vinum -> quit
  crash2# newfs /dev/vinum/vinum0
  /dev/vinum/vinum0: 2.6MB (5348 sectors) block size 16384, fragment size 2048
          using 4 cylinder groups of 0.66MB, 42 blks, 128 inodes.
  super-block backups (for fsck -b #) at:
   160, 1504, 2848, 4192
  cg 0: bad magic number

console:
  vinum: loaded
  vinum: drive vinumdrive0 is up
  vinum: drive vinumdrive1 is up
  vinum: drive vinumdrive2 is up
  vinum: vinum0.p0.s0 is up
  vinum: vinum0.p0.s1 is up
  vinum: vinum0.p0.s2 is up
  vinum: vinum0.p0 is up
  vinum: vinum0 is up

So clearly UFS is unhappy with something about the array.  I tried
reading/writing stuff to/from the array with pretty mixed results: 

ttyp0:
  crash2# diskinfo /dev/vinum/vinum0
  /dev/vinum/vinum0       512     2738688 5349
  crash2# dd if=/dev/random of=/data.file bs=512 count=5349
  5349+0 records in
  5349+0 records out
  2738688 bytes transferred in 2.520634 secs (1086508 bytes/sec)
  crash2# dd if=/data.file of=/dev/vinum/vinum0 bs=512 count=5349
  5349+0 records in
  5349+0 records out
  2738688 bytes transferred in 2.464483 secs (1111263 bytes/sec)
  crash2# dd if=/dev/vinum/vinum0 of=/data.file2 bs=512 count=5349
  5349+0 records in
  5349+0 records out
  2738688 bytes transferred in 2.467386 secs (1109955 bytes/sec)
  crash2# ls -l /data.f*
  -rw-r--r--  1 root  wheel  2738688 Nov 17 17:02 /data.file
  -rw-r--r--  1 root  wheel  2738688 Nov 17 17:03 /data.file2
  crash2# md5 /data.file*
  MD5 (/data.file) = ce76d17b337f70c1d4d53b48cf08f906
  MD5 (/data.file2) = b1d08e0fe52ecff364a894edf43caef2

The reason for the somewhat long copy times is that / for this box is out
of NFS.  To be sure, I ran this a second time:

  MD5 (/data.file.3) = d0c9d71cfacedc70358be028f0c346dd
  MD5 (/data.file.4) = 0ea319da8e68550c2ebf91e6b1618976

It sounds like there's a serious problem with Vinum right now.  I took a
look through the vinum data structures, and I couldn't see any obvious
problems that could have stemmed from the statfs() change: specifically, I
didn't see any data structures that would have changed size as a result of
the change.  So I'm guessing it was some other similarly timed change, but
I'm not sure what.

It's interesting to observe that I didn't get the malloc failure when I
unloaded Vinum after the above tests: it appears to occur as a result of a
configuration difficulty (such as a failure to find one), and so may
actually be a red herring for the underlying problem.  Or at least, an
independent bug/feature.

I'm heading home for the day, when I head home, I'll try changing around
the testing procedure to attempt to identify what exactly is getting
corrupted in my dd tests.

Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
robert_at_fledge.watson.org      Network Associates Laboratories
Received on Mon Nov 17 2003 - 13:13:46 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:29 UTC