Re: OOM killer and kernel cache reclamation rate limit in vm_pageout_scan()

From: Steven Hartland <killing_at_multiplay.co.uk> Date: Thu, 16 Oct 2014 10:08:53 +0100 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:53 UTC

Unfortunately ZFS doesn't prevent new inflight writes until it
hits zfs_dirty_data_max, so while what your suggesting will
help, if the writes come in quick enough I would expect it to
still be able to out run the pageout.

----- Original Message ----- 
From: "Justin T. Gibbs" <gibbs_at_FreeBSD.org>
To: <freebsd-current_at_freebsd.org>
Cc: <alc_at_FreeBSD.org>; "Andriy Gapon" <avg_at_freebsd.org>
Sent: Thursday, October 16, 2014 6:56 AM
Subject: OOM killer and kernel cache reclamation rate limit in vm_pageout_scan()

avg pointed out the rate limiting code in vm_pageout_scan() during discussion about PR 187594.  While it certainly can contribute to 
the problems discussed in that PR, a bigger problem is that it can allow the OOM killer to be triggered even though there is plenty 
of reclaimable memory available in the system.  Any load that can consume enough pages within the polling interval to hit the 
v_free_min threshold (e.g. multiple 'dd if=/dev/zero of=/file/on/zfs') can make this happen.

The product I’m working on does not have swap configured and treats any OOM trigger as fatal, so it is very obvious when this 
happens. :-)

I’ve tried several things to mitigate the problem.  The first was to ignore rate limiting for pass 2.  However, even though ZFS is 
guaranteed to receive some feedback prior to OOM being declared, my testing showed that a trivial load (a couple dd operations) 
could still consume enough of the reclaimed space to leave the system below its target at the end of pass 2.  After removing the 
rate limiting entirely, I’ve so far been unable to kill the system via a ZFS induced load.

I understand the motivation behind the rate limiting, but the current implementation seems too simplistic to be safe.  The 
documentation for the Solaris slab allocator provides good motivation for their approach of using a “sliding average” to reign in 
temporary bursts of usage without unduly harming efficient service for the recorded steady-state memory demand.  Regardless of the 
approach taken, I believe that the OOM killer must be a last resort and shouldn’t be called when there are caches that can be 
culled.

One other thing I’ve noticed in my testing with ZFS is that it needs feedback and a little time to react to memory pressure. 
Calling it’s lowmem handler just once isn’t enough for it to limit in-flight writes so it can avoid reuse of pages that it just 
freed up.  But, it doesn’t take too long to react (> 1sec in the profiling I’ve done).  Is there a way in vm_pageout_scan() that we 
can better record that progress is being made (pages were freed in the pass, even if some/all of them were consumed again) and allow 
more passes before the OOM killer is invoked in this case?

—
Justin

_______________________________________________
freebsd-current_at_freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"