Any success stories for HAST + ZFS?

From: Freddie Cash <fjwcash_at_gmail.com>
Date: Thu, 24 Mar 2011 13:36:32 -0700
[Not sure which list is most appropriate since it's using HAST + ZFS
on -RELEASE, -STABLE, and -CURRENT.  Feel free to trim the CC: on
replies.]

I'm having a hell of a time making this work on real hardware, and am
not ruling out hardware issues as yet, but wanted to get some
reassurance that someone out there is using this combination (FreeBSD
+ HAST + ZFS) successfully, without kernel panics, without core dumps,
without deadlocks, without issues, etc.  I need to know I'm not
chasing a dead rabbit.

In tests using VirtualBox and FreeBSD 8-STABLE from when HAST was
first MFC'd, everything worked wonderfully.   HAST-based pool would
come up, data would sync to the slave node, fail-over worked nicely,
bringing the other box back online as the slave worked, data synced
back, etc.  It was a thing of beauty.

Now, on real hardware, I cannot get the system to stay online for more
than an hour.  :(  hastd causes kernel panics with "bufwrite: buffer
not busy" errors.  ZFS pools get corrupted.  System deadlocks (no log
messages, no onscreen errors, not even NumLock key works) at random
points.

The hardware is fairly standard fare:
  - SuperMicro H8DGi-F motherboard
  - AMD Opteron 6100-series CPU (8-cores _at_ 2.0 GHz)
  - 8 GB DDR3 SDRAM
  - 64 GB Kingston V-Series SSD for the OS install (using ahci(4) and
the motherboard SATA controller)
  - 3x SuperMicro AOC-USAS2-8Li SATA controllers with IT firmware
  - 6x 1.5 TB Seagate 7200.11 drives (1x raidz2 vdev)
  - 12x 1.0 TB Seagate 7200.12 drives (2x raidz2 vdev)
  - 6x 0.5 TB WD RE3 drives (1x raidz2 vdev)

The motherboard BIOS is up-to-date.  I do not see any way to update
the firmware on the SATA controllers.  Using the onboard IPMI-based
sensors, CPU, motherboard, RAM temps and volatages are in the nominal
range.

I've tried with FreeBSD 8.2-RELEASE, 8-STABLE, 8-STABLE w/ZFSv28
patches, and 9-CURRENT (after the ZFSv28 commit).  Things work well
until I start hastd.  Then either the system locks up, or hastd causes
a kernel panic, or hastd dumps core.

Each harddrive is glabel'd as "disk-a1" through "disk-d6".

hast.conf has 24 resources listed, one for each glabel'd device.

The pool is created using the /dev/hast/* devices with disk-a1 through
disk-a6 being one raidz2 vdev, and so on through disk-b*, disk-c*, and
disk-d*, for a total of 4 raidz2 vdevs of 6 drives each.  A fairly
standard setup, I would think.

Even using a GENERIC kernel, I can't keep things stable and running.

So, please, someone, somewhere, share a success story, where you're
using FreeBSD, ZFS, and HAST.  Let me know that it does work.  I'm
starting to lose faith in my abilities here.  :(

Or point out where I'm doing things wrong so I can correct the issues.

Thanks.
-- 
Freddie Cash
fjwcash_at_gmail.com
Received on Thu Mar 24 2011 - 19:36:34 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:12 UTC