Re: Broken memory management on system with no swap

From: Matthew Dillon <dillon_at_apollo.backplane.com> Date: Sun, 20 Apr 2003 12:24:59 -0700 (PDT) · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:04 UTC

:Thanks for your analysis.  I thought there might be a GBDE-related
:factor when Lucky mentioned that copying a file triggers the bug,
:since cp(1) turns off mmap() mode for files > 8 MB to avoid this
:sort of thing.  But nevertheless, I can see how the situation you
:describe can occur, where the system realizes too late that all of
:the reclaimable pages are tied up in the active queue.

    Yes, I can see that happening too.  The inactive queue is scanned before
    the active queue (again, the ordering is important for normal operation
    and we wouldn't want to change it).  But this also creates a situation
    where moving pages from the active queue to the inactive queue and then
    laundering or reclaiming them from the inactive queue requires two
    passes before the system recognizes the newly available memory.

    If some operation.. say a copy, causes nearly all available pages to
    be moved to the active queue, whether clean or dirty, it would require
    two passes before any of those pages could be reused.  In this case
    we know that the use of write() will not create an excessive number of
    dirty pages in the active queue due to (A) the limited size of the
    buffer cache and (B) the write-behind clustering that occurs when
    writing a file sequentially.  

    So it *must* simply be the fact that all the pages are made active
    very quickly and the pageout code simply requires two passes to
    get through the queues before it can reuse any of those pages.

    The 'pass != 0' test should be able to handle both cases assuming
    that the page's act_count does not get in the way.  Whether or not
    act_count gets in the way of us being able to reclaim a page in two
    passes can be tested by setting vm.pageout_algorithm to 1 (which will
    cause act_count to be ignored).  If the problem still occurs with
    the pass != 0 test and vm.pageout_algorithm set to 0 (the default),
    but does not occur with vm.pageout_algorithm set to 1, then we know
    the problem is due to pages not being moved out of the active queue
    quickly enough (1) for this situation.

    note (1): normally act_count protects against thrashing.  It is the
    active queue's act_count algorithm which gives FreeBSD's such a nice
    smooth degredation curve when memory loads become extreme by preventing
    a frequently accessed page from being freed too early, so we don't
    want to just turn it off.  Maybe we need a test for 'too many active
    pages', aka when > 80% of available pages are in the active queue
    to temporarily disable the act_count test.

						-Matt

:>     I suggest changing this:
:> 
:>         if ((vm_swap_size < 64 && vm_page_count_min()) ||
:>             (swap_pager_full && vm_paging_target() > 0)) {
:> 
:>     To this:
:> 
:>         if (pass != 0 && 
:> 	    ((vm_swap_size < 64 && vm_page_count_min()) ||
:>             (swap_pager_full && vm_paging_target() > 0))) {