Re: Memory accounting discrepancy in top (possibly ZFS related)

From: Giorgos Keramidas <keramida_at_freebsd.org>
Date: Wed, 12 Sep 2007 01:29:07 +0300
On 2007-09-10 22:40, Peter Schuller <peter.schuller_at_infidyne.com> wrote:
> I have a machine (Dell 2950) where I orignally thought there was a
> memory visibility problem. It has 4 GB in total which seemed to be
> detected on boot based on dmesg, yet top showed a total of slightly
> above 2 GB. I posted to -questions about this in the belief that this
> was the case from boot onwards.

This is a bug in top.  Please file a PR and optionally assign it to me.
I'll try to fix it with the patch attached at the end of this message.

> However, I discovered that after boot memory adds up in top, but then
> goes down. Observe the following progression of top memory and their
> totals over the course of about a day, on amd64 with CURRENT from
> 2007-09-09:
>
>  10M Active, 1388K Inact,  84M Wired, 268K Cache,  800K Buf, 3822M Free Total: 3918
> 272M Active,   31M Inact, 650M Wired,  42M Cache,  992K Buf, 1391M Free Total: 2386.9
> 274M Active,   32M Inact, 655M Wired,  52M Cache,  992K Buf, 1372M Free Total: 2385.9
> 282M Active,   31M Inact, 646M Wired,  53M Cache,  992K Buf, 1372M Free Total: 2384.9
> 332M Active,  146M Inact, 719M Wired,  52M Cache,  214M Buf, 1071M Free Total: 2534

Slightly edited the text below to align text better and add a few
observations based on the columns up to 'Free' above.

  * All units converted to KB (using 1 MB = 1024 KB as the utils.c
    implementation of the current format_k() function).
  * Added an 'in-use' column; the sum of the previous ones
  * Added a 'use^2' column; it is log2(in-use * 1024)
  * Kept the 'free' that top(1) reports.
  * Added a 'total' column with sum(in-use, free)

 active   inact   wired   cache     buf |  in-use |  use^2 |    free |   total
  10240    1388   86016     268     800 |   98712 | 26.590 | 3913728 | 4012440
 278528   31744  665600   43008     992 | 1019872 | 29.959 | 1424384 | 2444256
 280576   32768  670720   53248     992 | 1038304 | 29.985 | 1404928 | 2443232
 288768   31744  661504   54272     992 | 1037280 | 29.984 | 1404928 | 2442208
 339968  149504  736256   53248  219136 | 1498112 | 30.514 | 1096704 | 2594816

The values of in-use and free add up to the value of 'total' you listed
above, but somehow this seems wrong.  If this is repeatable, can you
also run in a second terminal something like:

    while true ; do
        date
        for sysctlname in \
            vm.stats.vm.v_active_count \
            vm.stats.vm.v_inactive_count \
            vm.stats.vm.v_wire_count \
            vm.stats.vm.v_cache_count \
            vm.stats.vm.v_free_count
        do
            sysctl "$sysctlname"
        done
        sleep 3
    done

Another terminal can run:

    while true ; do
        date -u
        top | cat
        sleep 3
    done

Then I will try to see if I can make sense of what triggers the weird
numbers you are seeing.

The tricky part is that top(1) uses `int' as the data type for storing
these results, and this is probably a mildly wrong idea on amd64 or
systems with *lots* of memory.  The size of `int' may not be large
enough to hold the byte-size of large page counts.

I tried something like:

    http://people.freebsd.org/~keramida/diff/top-uint64.diff

but there are _many_ places where sizeof(int) limits what top uses,
and this is not really a good fix (yet).

- Giorgos
Received on Tue Sep 11 2007 - 20:29:39 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:17 UTC