On Tue, 8 Jan 2008, Vadim Goncharov wrote: >> To make life slightly more complicated, small malloc allocations are >> actually implemented using uma -- there are a small number of small object >> size zones reserved for this purpose, and malloc just rounds up to the next >> such bucket size and allocations from that bucket. For larger sizes, >> malloc goes through uma, but pretty much directly to VM which makes pages >> available directly. So when you look at "vmstat -z" output, be aware that >> some of the information presented there (zones named things like "128", >> "256", etc) are actually the pools from which malloc allocations come, so >> there's double-counting. > > Yes, I've known it, but didn't known what column names exactly mean. > Requests/Failures, I guess, is a pure statistics, Size is one element size, > but why USED + FREE != LIMIT (on whose where limit is non-zero) ? Possibly we should rename the "FREE" column to "CACHE" -- the free count is the number of items in the UMA cache. These may be hung in buckets off the per-CPU cache, or be spare buckets in the zone. Either way, the memory has to be reclaimed before it can be used for other purposes, and generally for complex objects, it can be allocated much more quickly than going back to VM for more memory. LIMIT is an administrative limit that may be configured on the zone, and is configured for some but not all zones. I'll let someone with a bit more VM experience follow up with more information about how the various maps and submaps relate to each other. >> The concept of kernel memory, as seen above, is a bit of a convoluted >> concept. Simple memory allocated by the kernel for its internal data >> structures, such as vnodes, sockets, mbufs, etc, is almost always not >> something that can be paged, as it may be accessed from contexts where >> blocking on I/O is not permitted (for example, in interrupt threads or with >> critical mutexes held). However, other memory in the kernel map may well be >> pageable, such as kernel thread stacks for sleeping user threads > > We can assume for simplicty that their memoru is not-so-kernel but part of > process address space :) If it is mapped into the kernel address space, then it still counts towards the limit on the map. There are really two critical resources: memory itself, and address space to map it into. Over time, the balance between address space and memory changes -- for a long time, 32 bits was the 640k of the UNIX world, so there was always plenty of address space and not enough memory to fill it. More recently, physical memory started to overtake address space, and now with the advent of widely available 64-bit systems, it's swinging in the other direction. The trick is always in how to tune things, as tuning parameters designed for "memory is bounded and address space is infinite" often work less well when that's not the case. In the early 5.x series, we had a lot of kernel panics because kernel constants were scaling to physical memory rather than address space, so the kernel would run out of address space, for example. >> (which can be swapped out under heavy memory load), pipe buffers, and >> general cached data for the buffer cache / file system, which will be paged >> out or discarded when memory pressure goes up. > > Umm. I think there is no point in swapping disk cache which can be > discarded, so the most actual part of kernel memory which is swappable are > anonymous pipe(2) buffers? Yes, that's what I meant. There are some other types of pageable kernel memory, such as memory used for swap-backed md devices. Robert N M Watson Computer Laboratory University of CambridgeReceived on Mon Jan 07 2008 - 22:39:14 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:25 UTC