Re: 5.x w/auto-maxusers has insane kern.maxvnodes

From: Bruce Evans <bde_at_zeta.org.au> Date: Sun, 9 May 2004 20:10:13 +1000 (EST) · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:53 UTC

On Sun, 9 May 2004, Brian Fundakowski Feldman wrote:

> Brian Fundakowski Feldman <green_at_FreeBSD.org> wrote:
> > I have a 512MB system and had to adjust kern.maxvnodes (desiredvnodes) down
> > to something reasonable after discovering that it was the sole cause of too
> > much paging for my workstation.  The target number of vnodes was set to
> > 33000, which would not be so bad if it did not also cause so many more
> > UFS, VM and VFS objects, and the VM objects' associated inactive cache
> > pages, lying around.  I ended up saving a good 100MB of memory just
> > adjusting kern.maxvnodes back down to something reasonable.  Here are the
> > current allocations (and some of the peak values):

The default for desiredvnodes is almost perfect for my main application
of running makeworld and otherwise working with the entire src tree.
Actually, it's too low with 512MB and almost perfect with 1024MB.  The
latter gives desiredvnodes = 70240, and there are 47742 vnodes in my
src tree (a few hundred extras).  512MB is also not quite enough for
caching the whole src tree (mine has 476358 1K-blocks according to
du).  In one application involving 2 src trees (slightly reduced to
get them both cached in 1024MB of which only about 800MB is available
for VMIO pages), I needed to increase kern.vnodes to 90000+ to avoid
disk accesses for inodes.  Caching them in vnodes didn't work because
the default number of vnodes wasn't enough, and caching them in VMIO
pages didn't work for some reason (either because I was testing a
filesystem that was missing VMIO for metadata, or because the replacement
policy didn't work -- when inodes are cached in vnodes and not written
to due to mounting with noatime, they get discarded from VMIO and then
when theire vnode gets recycled they aren't cached anywhere).

Since 512MB isn't enough to cache everything for makeworld, the default
of 33000+ vnodes won't help much, and a better target might be to cache
everything in /sys.  15000 vnodes and a couple of hundred MB is enough
for that unless you build too many modules or kernels.

> > ITEM            SIZE     LIMIT     USED    FREE  REQUESTS
> > FFS2 dinode:     256,        0,  12340,     95,  1298936
> > FFS1 dinode:     128,        0,    315,   3901,  2570969
> > FFS inode:       140,        0,  12655,  14589,  3869905
> > L VFS Cache:     291,        0,      5,    892,    51835
> > S VFS Cache:      68,        0,  13043,  23301,  4076311
> > VNODE:           260,        0,  32339,     16,    32339
> > VM OBJECT:       132,        0,  10834,  24806,  2681863

I don't use ffs2 (nice to see ffs* spelled right), so I have slightly
smaller oveheads.

> > (The number of VM pages allocated specifically to vnodes is not something
> > easy to determine other than the fact that I saved so much memory even
> > without the objects themselves, after uma_zfree(), having been reclaimed.)

The number of VMIO pages is also hard to determine.  systat's "inact"
count gives an approximate value for the amount of VMIO memory, but
various stats utilities' "buf" count gives a useless value.  VMIO pages
are easier to flush (unmount works for them).

> > We really need to look into making the desiredvnodes default target more
> > sane before 5.x is -STABLE or people are going to be very surprised
> > switching from 4.x and seeing paging increase substantially.  One more

5.x has bloat everywhere?  Is desiredvnodes the worst part of it?  I haven't
noticed its bloat especially.  Not long ago (in early 4.x?), the number of
vnodes was unbounded and there were bugs like the ufs inode allocation
doubling due to the required amount growing for bogus reasons to just
larger than a power of 2 (so that power of 2 allocation almost doubled it).

> > but why are they not already like that?  One last good example I personally
> > see of wastage-by-virtue-of-zfree-function is the page tables on i386:
> > PV ENTRY:         28,   938280,  59170, 120590, 199482221
> > Once again, why do those actually need to be non-reclaimable?

I haven't noticed much wastage for PV ENTRY.  Right now, I have only the
following large memory consumers in uma, but the system hasn't been up
long and the measurement is distorted by recently reading the src tree:

%%%
ITEM            SIZE     LIMIT     USED    FREE  REQUESTS
FFS1 dinode:     128,        0,  52202,     33,   563854
FFS inode:       140,        0,  52202,     46,   563854
S VFS Cache:      68,        0,  52494,     75,   573456
VNODE:           260,        0,  52209,     21,    52209
2048:           2048,        0,    123,   2843,    20845
PV ENTRY:         28,  1494920,   4438,   2282,   703502
VM OBJECT:       132,        0,  52355,     85,   495338
%%%

PV ENTRY's are small, so the 2048's waste a lot more.  It's hard to see
what they are for; vmstat -z never showed as much as vmstat -m, and
vmstat -m is not as good as it used to be.

> It really doesn't seem appropriate to _ever_ scale maxvnodes (desiredvnodes)
> up that high just because I have 512MB of RAM.

Like most things, the best value depends on the workload.  Sinc the number
of vnodes that can be handled scales with the amount of memory, it seems
reasonable for the default to scale with the amount of memory.  -current
needs a larger scale factor than RELENG_4 if anything, since it has more
files.  Combined with more costs per file, it could easily need twice as
much real memory as RELENG_4 for equivalent disk caching.

Bruce