Re: zfs recv hangs in kmem arena

From: James R. Van Artsdalen <james-freebsd-fs2_at_jrv.org>
Date: Sun, 26 Oct 2014 18:22:46 -0600
I was able to complete a ZFS replication by manually intervening each
time "zfs recv" blocked on "kmem arena": running the program at the end
was sufficient to unblock zfs each of the 17 times it stalled.

The program is intended to consume about 24GB RAM out of 32GB physical
RAM, thereby pressuring the ARC and kernel cache to shrink: when the
program exits it would leave plenty of free RAM for zfs or whatever
else.  What actually happened is that every time, zfs unblocked as the
program below was growing: it was never necessary to wait for the
program to exit and free memory before zfs unblocked.

On 10/16/2014 6:25 AM, James R. Van Artsdalen wrote:
> The zfs recv / kmem arena hang happens with -CURRENT as well as
> 10-STABLE, on two different systems, with 16GB or 32GB of RAM, from
> memstick or normal multi-user environments,
>
> Hangs usually seem to hapeen 1TB to 3TB in, but last night one run hung
> after only 4.35MB.
>
> On 9/26/2014 1:42 AM, James R. Van Artsdalen wrote:
>> FreeBSD BLACKIE.housenet.jrv 10.1-BETA2 FreeBSD 10.1-BETA2 #2 r272070M:
>> Wed Sep 24 17:36:56 CDT 2014    
>> james_at_BLACKIE.housenet.jrv:/usr/obj/usr/src/sys/GENERIC  amd64
>>
>> With current STABLE10 I am unable to replicate a ZFS pool using zfs
>> send/recv without zfs hanging in state "kmem arena", within the first
>> 4TB or so (of a 23TB Pool).
>>
>> The most recent attempt used this command line
>>
>> SUPERTEX:/root# zfs send -R BIGTEX/UNIX_at_syssnap | ssh BLACKIE zfs recv
>> -duvF BIGTOX
>>
>> though local replications fail in kmem arena too.
>>
>> The two machines I've been attempting this on have 16BG and 32GB of RAM
>> each and are otherwise idle.
>>
>> Any suggestions on how to get around, or investigate, "kmem arena"?
>>
>> # top
>> last pid:  3272;  load averages:  0.22,  0.22,  0.23                  up
>> 0+08:25:02  01:32:07
>> 34 processes:  1 running, 33 sleeping
>> CPU:  0.0% user,  0.0% nice,  0.1% system,  0.0% interrupt, 99.9% idle
>> Mem: 21M Active, 82M Inact, 15G Wired, 28M Cache, 450M Free
>> ARC: 12G Total, 24M MFU, 12G MRU, 23M Anon, 216M Header, 47M Other
>> Swap: 16G Total, 16G Free
>>
>>   PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME    WCPU
>> COMMAND
>>  1173 root          1  52    0 86476K  7780K select  0 124:33   0.00% sshd
>>  1176 root          1  46    0 87276K 47732K kmem a  3  48:36   0.00% zfs
>>   968 root         32  20    0 12344K  1888K rpcsvc  0   0:13   0.00% nfsd
>>  1009 root          1  20    0 25452K  2864K select  3   0:01   0.00% ntpd
>> ...

#include <stdlib.h>
#include <string.h>

long long s = ( (long long) 1 << 32) - 65;

main()
{
  char *p;

  p = calloc (s, 1);
  memset (p, 1, s);
  p = calloc (s, 1);
  memset (p, 1, s);
  p = calloc (s, 1);
  memset (p, 1, s);
  p = calloc (s, 1);
  memset (p, 1, s);
  p = calloc (s, 1);
  memset (p, 1, s);
  p = calloc (s, 1);
  memset (p, 1, s);
}
Received on Sun Oct 26 2014 - 23:22:45 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:53 UTC