Re: process killed: text file modification

From: Dimitry Andric <dim_at_FreeBSD.org>
Date: Thu, 16 Mar 2017 23:22:08 +0100
> On 12 Mar 2017, at 18:47, Ian Lepore <ian_at_FreeBSD.org> wrote:
> 
> On Thu, 2017-03-09 at 21:07 +0100, Gergely Czuczy wrote:
>> 
>> On 2017. 03. 09. 20:47, Gergely Czuczy wrote:
>>> 
>>> 
>>> 
>>> On 2017. 03. 09. 19:44, John Baldwin wrote:
>>>> 
>>>> On Thursday, March 09, 2017 03:31:56 PM Gergely Czuczy wrote:
>>>>> 
>>>>> [+freebsd-fs]
>>>>> 
>>>>> 
>>>>> On 2017. 03. 09. 14:20, Gergely Czuczy wrote:
>>>>>> 
>>>>>> On 2017. 03. 09. 11:27, Gergely Czuczy wrote:
>>>>>>> 
>>>>>>> Hello,
>>>>>>> 
>>>>>>> I'm trying to build a few things from ports on an rpi3, the
>>>>>>> ports
>>>>>>> collection is mounted over NFS from another machine. When
>>>>>>> it's trying
>>>>>>> to build pkg i'm getting the error message in syslog:
>>>>>>> 
>>>>>>> rpi3 kernel: pid 4451 (sh), uid 0, was killed: text file
>>>>>>> modification
>>>>>>> 
>>>>>>> The report to pkg_at_:
>>>>>>> https://lists.freebsd.org/pipermail/freebsd-pkg/2017-March/
>>>>>>> 002048.html
>>>>>>> 
>>>>>>> 
>>>>>>> In ports-mgmt/pkg's config.log It fails at the following
>>>>>>> entry:
>>>>>>> configure:3726: checking whether we are cross compiling
>>>>>>> configure:3734: cc -o conftest -O2 -pipe  -Wno-error
>>>>>>> -fno-strict-aliasing   conftest.c  >&5
>>>>>>> configure:3738: $? = 0
>>>>>>> configure:3745: ./conftest
>>>>>>> configure:3749: $? = 137
>>>>>>> configure:3756: error: in
>>>>>>> `/usr/ports/ports-mgmt/pkg/work/pkg-1.10.0':
>>>>>>> configure:3760: error: cannot run C compiled programs.
>>>>>>> If you meant to cross compile, use `--host'.
>>>>>>> See `config.log' for more details
>>>>>>> 
>>>>>>> # uname -a
>>>>>>> FreeBSD rpi3 12.0-CURRENT FreeBSD 12.0-CURRENT #0 r314949:
>>>>>>> Thu Mar 9
>>>>>>> 08:58:46 CET 2017
>>>>>>> aegir_at_marvin.harmless.hu:/tank/rpi3/crochet/work/obj/arm64.
>>>>>>> aarch64/tank/rpi3/src/sys/AEGIR
>>>>>>> 
>>>>>>> arm64
>>>>>> So far, a few additions:
>>>>>> Time is synced between the NFS server and the client.
>>>>>> it's an open() call which is getting the kill, and it's not
>>>>>> the file
>>>>>> what's being opened, but the process executing it.
>>>>>> Here's a simple code that reproduces it:
>>>>>> #include <stdio.h>
>>>>>> 
>>>>>> int main() {
>>>>>> 
>>>>>>    FILE *f = fopen ("/bar", "w");
>>>>>> 
>>>>>>    fclose(f);
>>>>>>    return 0;
>>>>>> }
>>>>>> 
>>>>>> Conditions to reproduce it:
>>>>>>   - The resulting binary must be executed from the nfs mount
>>>>>>   - The binary must be built after mounting the NFS share.
>>>>>> 
>>>>>> I haven't tried building it on a different host, I don't have
>>>>>> access
>>>>>> to multiple RPis. Also, if I build the binary, umount/remount
>>>>>> the NFS
>>>>>> mount point, which has the binary, execute it, then it works.
>>>>>> 
>>>>>> I've also tried this with the raspbsd.org's image, I could
>>>>>> reproduce
>>>>>> it as well.
>>>>>> 
>>>>>> Another interesting thing is, when I first booted the RPi up,
>>>>>> the NFS
>>>>>> server was a 10.2-STABLE, and later got updated to 11-STABLE.
>>>>>> While it
>>>>>> was 10.2 I've tried to build some port, and I don't remember
>>>>>> having
>>>>>> this issue.
>>>>>> 
>>>>>> So, could someone please help me figure this out and fix it?
>>>>>> This
>>>>>> stuff should work pretty much.
>>>>>> 
>>>>> So, this error message comes from here:
>>>>> https://svnweb.freebsd.org/base/head/sys/fs/nfsclient/nfs_clbio
>>>>> .c?revision=314436&view=markup#l1674
>>>>> 
>>>>> 
>>>>> It's the NFS_TIMESPEC_COMPARE(&np->n_mtime, &np-
>>>>>> n_vattr.na_mtime)
>>>>> comparision that fails, np should be the NFS node structure,
>>>>> from the
>>>>> vnode's v_data, and n_vattr is the attribute cache. As I've
>>>>> seen these
>>>>> two are being updated together, so I don't really see by the
>>>>> code why
>>>>> they might differ. Could someone please take a look at it, with
>>>>> more
>>>>> experience in the NFS code? -czg
>>>> Can you print out the two mtimes?  I wonder if what's happening
>>>> is that
>>>> your server uses different granularity (for example just seconds)
>>>> than
>>>> your client, so on the client we generate a timestamp with a non-
>>>> zero
>>>> nanoseconds but when the server receives that timestamp it
>>>> "truncates"
>>>> it.  During open() we forcefully re-fetch the timestamp (for CTO
>>>> consistency) and then notice it doesn't match.  For now I would
>>>> start
>>>> with comparing the timestamps and maybe the
>>>> vfs.timestamp_precision
>>>> sysctls on client and server (if server is a FreeBSD box).
>>> Here are the time values:
>>> Mar  9 19:46:01 rpi3 kernel: np->n_mtime: -3298114786344 +
>>> -3298114786336  &np->n_vattr.na_mtime: -3298114786616 +
>>> -3298114786608
>>> Mar  9 19:46:01 rpi3 kernel: pid 912 (csh), uid 0, was killed:
>>> text
>>> file modification
>>> Mar  9 19:46:01 rpi3 kernel: np->n_mtime: -3298114786344 +
>>> -3298114786336  &np->n_vattr.na_mtime: -3298114786616 +
>>> -3298114786608
>>> Mar  9 19:46:01 rpi3 kernel: pid 912 (csh), uid 0, was killed:
>>> text
>>> file modification
>>> 
>>> Printed this way:
>>>                          printf("np->n_mtime: %ji + %ji
>>> &np->n_vattr.na_mtime: %ji + %ji",
>>> (intmax_t)(&np->n_mtime.tv_sec),
>>> (intmax_t)(&np->n_mtime.tv_nsec),
>>> (intmax_t)(&np->n_vattr.na_mtime.tv_sec),
>>> (intmax_t)(&np->n_vattr.na_mtime.tv_nsec));
>> Sorry, I made a typo there. Here's it now:
>> Mar  9 20:05:35 rpi3 kernel: np->n_mtime: 1489089935 + 219323000
>> &np->n_vattr.na_mtime: 1489089935 + 221438000
>> Mar  9 20:05:35 rpi3 kernel: pid 847 (csh), uid 0, was killed: text
>> file
>> modification
>> Mar  9 20:05:35 rpi3 kernel: np->n_mtime: 1489089935 + 219323000
>> &np->n_vattr.na_mtime: 1489089935 + 221438000
>> Mar  9 20:05:35 rpi3 kernel: pid 847 (csh), uid 0, was killed: text
>> file
>> modification
>> 
>> That's a difference of 2115 micro seconds.
> 
> I think this is a problem that sysctl vfs.timestamp_precision is
> supposed to help solve.  Unfortunately, that knob doesn't provide very
> much control; you can truncate timestamps to full seconds,
> microseconds, or 1/Hz which may be milliseconds on many systems.
>  You've got a difference of about 2ms, so only the full seconds
> granularity might help.
> 
> It's also not clear to me whether that setting would have to be changed
> on the server, or the client, or both.

I'm also running into this problem, but while using lld.  I must set
vfs.timestamp_precision to 1 (e.g. sec + ns accurate to 1/HZ) on both
the client and the server, to make it work.

Instead of GNU ld, lld uses mmap to write to the output executable.  If
this executable is more than one page, and resides on an NFS share,
running it will almost always result in "text file modification", if
vfs_timestamp_precision >= 2.

A small test case: http://www.andric.com/freebsd/test-mmap-write.c,
which writes a simple "hello world" i386-freebsd executable file, using
the sequence: open() -> ftruncate() -> mmap() -> memcpy() -> munmap() ->
close().

Running this on an NFS share, and then attempting to run the resulting
'helloworld' executable will result in the "text file modification"
error, and it will be killed.  But if you simply copy the executable to
something else, then it runs fine, even if you use -p to retain the
properties!

IMHO this is a rather surprising problem with the NFS code, and Kostik
remarked that the problem seems to be that the VV_TEXT flag is set too
early, before the nfs cache is invalidated.  Rick, do you have any ideas
about this?

-Dimitry


Received on Thu Mar 16 2017 - 21:22:26 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:10 UTC