Re: ARC "pressured out", how to control/stabilize ? (reformatted to text/plain)

From: Vitalij Satanivskij <satan_at_ukr.net>
Date: Tue, 4 Feb 2014 12:08:23 +0200
Dear Andriy and FreeBSD community,

With patch system panic on boot. 

After remove cache device from pool system boot without problem.

After this cache added again and sone kernel panic happened

Screen shot of panic here http://i61.tinypic.com/30sbx2g.jpg



Vitalij Satanivskij wrote:
VS> Dear Andriy and FreeBSD community,
VS> 
VS> Build world with path failed with error 
VS> 
VS> /usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:4642:13: error: use of
VS>       undeclared identifier 'l2hdr'
VS>                         ASSERT3P(l2hdr->b_tmp_cdata, ==, NULL);
VS>                                  ^
VS> /usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/sys/debug.h:125:40: note: expanded from
VS>       macro 'ASSERT3P'
VS> #define ASSERT3P(x, y, z)       VERIFY3_IMPL(x, y, z, uintptr_t)
VS>                                              ^
VS> /usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/sys/debug.h:109:29: note: expanded from
VS>       macro 'VERIFY3_IMPL'
VS>         const TYPE __left = (TYPE)(LEFT); \
VS>                                    ^
VS> 1 error generated.
VS> *** Error code 1
VS> 
VS> 
VS> 
VS> Vladimir Sharun wrote:
VS> VS> Dear Andriy and FreeBSD community,
VS> VS> 
VS> VS> L2ARC temporarily turned off by setting secondarycache=none everywhere it was enabled,
VS> VS> so no more leak for one particular day.
VS> VS> 
VS> VS> Here's the top header:
VS> VS> last pid: 89916;  load averages:  2.49,  2.91,  2.89    up 5+19:21:42  14:09:12
VS> VS> 561 processes: 2 running, 559 sleeping
VS> VS> CPU:  5.7% user,  0.0% nice, 14.0% system,  1.0% interrupt, 79.3% idle
VS> VS> Mem: 23G Active, 1017M Inact, 98G Wired, 1294M Cache, 3285M Buf, 1997M Free
VS> VS> ARC: 69G Total, 3498M MFU, 59G MRU, 53M Anon, 1651M Header, 4696M Other
VS> VS> Swap:
VS> VS> 
VS> VS> Here's the calculated vmstat -z (mean all of the allocations, which exceeds 100*1024^2 printed):
VS> VS> UMA Slabs:      199,915M
VS> VS> VM OBJECT:      207,354M
VS> VS> 32:     205,558M
VS> VS> 64:     901,122M
VS> VS> 128:    215,211M
VS> VS> 256:    242,262M
VS> VS> 4096:   2316,01M
VS> VS> range_seg_cache:        205,396M
VS> VS> zio_buf_512:    1103,31M
VS> VS> zio_buf_16384:  15697,9M
VS> VS> zio_data_buf_16384:     348,297M
VS> VS> zio_data_buf_24576:     129,352M
VS> VS> zio_data_buf_32768:     104,375M
VS> VS> zio_data_buf_36864:     163,371M
VS> VS> zio_data_buf_53248:     100,496M
VS> VS> zio_data_buf_57344:     105,93M
VS> VS> zio_data_buf_65536:     101,75M
VS> VS> zio_data_buf_73728:     111,938M
VS> VS> zio_data_buf_90112:     104,414M
VS> VS> zio_data_buf_106496:    100,242M
VS> VS> zio_data_buf_131072:    61652,5M
VS> VS> dnode_t:        3203,98M
VS> VS> dmu_buf_impl_t: 797,695M
VS> VS> arc_buf_hdr_t:  1498,76M
VS> VS> arc_buf_t:      105,802M
VS> VS> zfs_znode_cache:        352,61M
VS> VS> 
VS> VS> zio_data_buf_131072 (61652M) + zio_buf_16384 (15698M) = 77350M
VS> VS> easily exceeds ARC total (70G)
VS> VS> 
VS> VS> 
VS> VS> Here's the same calculations from exact the same system where L2 was disabled before reboot:
VS> VS> last pid: 63407;  load averages:  2.35,  2.71,  2.73    up 8+19:42:54  14:17:33
VS> VS> 527 processes: 1 running, 526 sleeping
VS> VS> CPU:  4.8% user,  0.0% nice,  6.6% system,  1.1% interrupt, 87.4% idle
VS> VS> Mem: 21G Active, 1460M Inact, 99G Wired, 1748M Cache, 3308M Buf, 952M Free
VS> VS> ARC: 87G Total, 4046M MFU, 76G MRU, 37M Anon, 2026M Header, 4991M Other
VS> VS> Swap:
VS> VS> 
VS> VS> and the vmstat -z filtered:
VS> VS> UMA Slabs:      208,004M
VS> VS> VM OBJECT:      207,392M
VS> VS> 32:     172,831M
VS> VS> 64:     752,226M
VS> VS> 128:    210,024M
VS> VS> 256:    244,204M
VS> VS> 4096:   2249,02M
VS> VS> range_seg_cache:        245,711M
VS> VS> zio_buf_512:    1145,25M
VS> VS> zio_buf_16384:  15170,1M
VS> VS> zio_data_buf_16384:     422,766M
VS> VS> zio_data_buf_20480:     120,742M
VS> VS> zio_data_buf_24576:     148,641M
VS> VS> zio_data_buf_28672:     112,848M
VS> VS> zio_data_buf_32768:     117,375M
VS> VS> zio_data_buf_36864:     185,379M
VS> VS> zio_data_buf_45056:     103,168M
VS> VS> zio_data_buf_53248:     105,32M
VS> VS> zio_data_buf_57344:     122,828M
VS> VS> zio_data_buf_65536:     109,25M
VS> VS> zio_data_buf_69632:     100,406M
VS> VS> zio_data_buf_73728:     126,844M
VS> VS> zio_data_buf_77824:     101,086M
VS> VS> zio_data_buf_81920:     100,391M
VS> VS> zio_data_buf_86016:     101,391M
VS> VS> zio_data_buf_90112:     112,836M
VS> VS> zio_data_buf_98304:     100,688M
VS> VS> zio_data_buf_102400:    106,543M
VS> VS> zio_data_buf_106496:    108,875M
VS> VS> zio_data_buf_131072:    63190,5M
VS> VS> dnode_t:        3437,36M
VS> VS> dmu_buf_impl_t: 840,62M
VS> VS> arc_buf_hdr_t:  1870,88M
VS> VS> arc_buf_t:      114,942M
VS> VS> zfs_znode_cache:        353,055M
VS> VS> 
VS> VS> Everything seems within ARC total range.
VS> VS> 
VS> VS> We will try patch attached within few days and will come back with the result.
VS> VS> 
VS> VS> Thank you for your help.
VS> VS> 
VS> VS> > on 28/01/2014 11:28 Vladimir Sharun said the following:
VS> VS> > > Dear Andriy and FreeBSD community,
VS> VS> > > 
VS> VS> > > After applying this path one of the systems runs fine (disk subsystem load low to moderate 
VS> VS> > > - 10-20% busy sustained),
VS> VS> > > 
VS> VS> > > Then I saw this patch was merged to the HEAD and we apply it to the one of the systems 
VS> VS> > > with moderate to high disk load: 30-60% busy (11.0-CURRENT #7 r261118: Fri Jan 24 17:25:08 EET 2014)
VS> VS> > > 
VS> VS> > > Within 4 days we experiencing the same leak(?) as without patch: 
VS> VS> > > 
VS> VS> > > last pid: 53841;  load averages:  4.47,  4.18,  3.78     up 3+16:37:09  11:24:39
VS> VS> > > 543 processes: 6 running, 537 sleeping
VS> VS> > > CPU:  8.7% user,  0.0% nice, 14.6% system,  1.4% interrupt, 75.3% idle
VS> VS> > > Mem: 22G Active, 1045M Inact, 98G Wired, 1288M Cache, 3284M Buf, 2246M Free
VS> VS> > > ARC: 73G Total, 3763M MFU, 62G MRU, 56M Anon, 1887M Header, 4969M Other
VS> VS> > > Swap:
VS> VS> > > 
VS> VS> > > The ARC is populated within 30mins under load to the max (90Gb) then start decreasing.
VS> VS> > > 
VS> VS> > > The delta between Wiread and ARC total start growing from typical 10-12Gb without L2 enabled
VS> VS> > > to the 25Gb with L2 enabled and counting (4 hours ago was 22Gb delta).
VS> VS> > 
VS> VS> > First,  have you checked that vmstat -z output contains the same anomaly as for
VS> VS> > in your original report?
VS> VS> > 
VS> VS> > If yes, the please try to reproduce the problem with the following debugging patch:
VS> VS> > http://people.freebsd.org/~avg/l2arc-b_tmp_cdata-diag.patch
VS> VS> > Please make sure to compile your kernel (and modules) with INVARIANTS.
VS> VS> > 
VS> VS> > -- 
VS> VS> > Andriy Gapon
VS> VS> > _______________________________________________
VS> VS> > freebsd-current_at_freebsd.org mailing list
VS> VS> > http://lists.freebsd.org/mailman/listinfo/freebsd-current
VS> VS> > To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"
VS> VS> _______________________________________________
VS> VS> freebsd-current_at_freebsd.org mailing list
VS> VS> http://lists.freebsd.org/mailman/listinfo/freebsd-current
VS> VS> To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"
VS> _______________________________________________
VS> freebsd-current_at_freebsd.org mailing list
VS> http://lists.freebsd.org/mailman/listinfo/freebsd-current
VS> To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"
Received on Tue Feb 04 2014 - 09:08:33 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:46 UTC