Re: HEADS UP: ZFSv28 is in!

From: Fabian Keil <freebsd-listen_at_fabiankeil.de> Date: Thu, 3 Mar 2011 20:23:39 +0100 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:12 UTC

Alexander Leidinger <Alexander_at_Leidinger.net> wrote:

> Quoting Fabian Keil <freebsd-listen_at_fabiankeil.de> (from Thu, 3 Mar  
> 2011 13:01:30 +0100):
> 
> > Alexander Leidinger <Alexander_at_Leidinger.net> wrote:
> >
> >> On Mon, 28 Feb 2011 19:21:29 +0100 Fabian Keil
> >> <freebsd-listen_at_fabiankeil.de> wrote:
> >>
> >> > Pawel Jakub Dawidek <pjd_at_FreeBSD.org> wrote:
> >> >
> >> > > I just committed ZFSv28 to HEAD.
> >> >
> >> > I updated the system without removing the tuning for ZFSv15
> >> > first, and somehow this completely messed up the performance.
> >> > Booting the system took more than ten minutes and even once
> >> > it was up it was next to unresponsive.
> >> >
> >> > I'm not sure which sysctl was to blame, but after removing
> >> > all but vfs.zfs.arc_max="800M" and rebooting, the problem
> >> > was gone.
> >>
> >> When you add the tuning back, does it take minutes again to boot? If
> >> not, I assume it was cleaning up some leftovers the old version was not
> >> able to cleanup.
> >
> > I haven't tried that yet, but as I didn't upgrade the system's
> > storage pool I don't think ZFS is supposed to do any such clean-ups.
> 
> AFAIK the new code knows how to remove some superfluous parts in your  
> pool (no matter at which version the pool is), which the old code just  
> skipped over. Such leftovers may not be in all pools, they show up  
> just in some use cases. For this reason I asked to verify by adding  
> the tuning back to this system (if possible).

I don't have an exact list of sysctls I used earlier,
but after commenting in all the zfs sysctls in loader.conf
(some of which must have been commented out for quite some
time) the problem was back.

I interrupted the boot process after 14 minutes at which point
the ezjail rc script was running for several minutes already,
but still busy with the first jail. Usually this takes only
a few seconds.

The sysctls used were:

#vfs.zfs.txg.timeout=5

Seems to be the default now.

# vfs.zfs.zil_disable=1

No longer supported.

# vfs.zfs.prefetch_disable=0

The default seems to be 1.

# vfs.zfs.write_limit_override=15

Clearly the value makes no sense, so this may not have
been active at the time of the update. I had a back-ported
patch to add the sysctl, so at least in theory it should
have caused problems with v15, too, unless there was
a sanity check to ignore obviously incorrect values.

The auto-tuned write-limit values are:
vfs.zfs.write_limit_max: 258863616
vfs.zfs.write_limit_min: 33554432

# vfs.zfs.vdev.max_pending=15

The auto-tuned value is 10.

vfs.zfs.arc_max="800M"
#  vfs.zfs.arc_min="500M"
# vfs.zfs.vdev.cache.size="5M"

The auto-tuned value is 10485760 which seems close enough.

# vfs.zfs.txg.synctime=1

This sysctl doesn't seem to exist (anymore).

   #vfs.zfs.cache_flush_disable=1

The default is 0.

#   vfs.zfs.txg.write_limit_override=134217728

Doesn't seem to exist (anymore) either.

#vfs.zfs.vdev.max_pending=2
#vfs.zfs.vdev.min_pending=1

The auto-tuned values are

vfs.zfs.vdev.min_pending: 4
vfs.zfs.vdev.max_pending: 10

> If it is not a production-like system which does not accept downtime,  
> this verification consumes less resources than sending out a developer  
> hunting for a problem which may not even exist.

It wasn't my intention to send anyone hunting.

Fabian