Strange problem with tar in -CURRENT (VM problem?)

From: Stefan Esser <se_at_freebsd.org>
Date: Sat, 12 May 2018 16:54:46 +0200
While searching for the reason an upgrade of math/atlas failed on my amd64
-CURRENT system, I found that tar fails to create an archive of some 10KB.

It is killed (-9) after some 30 seconds during which it grows seemingly
without bounds.

The port processes some TAR files in order to fixup paths in them with
the following shell loop (edited for readability):

cd ${WRKDIR}/ATLAS/CONFIG/ARCHS
for t in *.tgz ; do
	/bin/mv ${t} ${t}.bak
	/usr/bin/tar -s '/gcc/gcc6/' -xf ${t}.bak
	/usr/bin/tar -czf ${t} ${t%.tgz}		# (***)
	/bin/rm -f -r ${t%.tgz} ${t}.bak
done

The command that fails is the one marked (***) and I have tried to trace it
with ktrace and truss, but only see that a large amount of memory is mapped
and the tar process is killed without having produced any output. I have
added "-v" to watch progress and the log does also indicate, that tar does
not even start to write to the archive. Removal of the "z" option makes no
difference.

Typical "ps l" output is:

UID  PID PPID CPU PRI NI       VSZ      RSS MWCHAN STAT TT     TIME COMMAND
  0 2269 2254   0  30  0 105946804 21244044 pfault D     0  0:31,48
/usr/bin/tar -czf Core232SSE3.tgz Core232SSE3 (bsdtar)

VSZ is 105946804 KB or about 100 GB, RSS 21 GB when tar is killed ...

The files to be processed are:

-rw-r--r--  1 root  wheel  11399 May 12 16:40 AMD64K10h32SSE3.tgz
-rw-r--r--  1 root  wheel  11697 May 12 16:40 AMD64K10h64SSE3.tgz
-rw-r--r--  1 root  wheel   1305 May 12 16:40 BOZOL1.tgz
-rw-r--r--  1 root  wheel   9909 May 12 16:40 Core232SSE3.tgz
drwxr-xr-x  5 root  wheel      9 Feb 25  2009 Core264SSE3/
-rw-r--r--  1 root  wheel      0 May 12 16:40 Core264SSE3.tgz
-rw-r--r--  1 root  wheel  10212 May 14  2011 Core264SSE3.tgz.bak
-rw-r--r--  1 root  wheel   8544 May 14  2011 Corei164SSE3.tgz
[...]

The failure may be caused by a race-condition, since sometimes tar fails on
a later file (e.g. Corei164SSE3.tgz).

If I replace "${TAR} -czf" with "gtar -czf", then the port can be built.

But I do not think that this is a problem in BSDTAR, since the failure can
be reproduced (also after a buildworld/buildkernel and reboot), but there
have been no changes to BSDTAR since the libarcjive upgrade in January.

I guess this is a VM problem, that happens to show itself in this specific
program invocation. (The system runs without other obvious problems and
tar works outside this specific usage in the port ...)

Any ideas?

Best regards, STefan
Received on Sat May 12 2018 - 13:02:16 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:16 UTC