Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved)

From: Mark Millard <marklmi_at_yahoo.com>
Date: Fri, 28 Dec 2018 18:56:43 -0800
On 2018-Dec-28, at 12:12, Mark Millard <marklmi at yahoo.com> wrote:

> On 2018-Dec-28, at 05:13, Michal Meloun <melounmichal at gmail.com> wrote:
> 
>> Mark,
>> this is known problem with qemu-user-static.
>> Emulation of every single interruptible syscall is broken by design (it
>> have signal related races). Theses races cannot be solved without major
>> rewrite of syscall emulation code.
>> Unfortunately, nobody actively works on this, I think.
>> 
> 
> Thanks for the note setting some expectations.
> 
> On the evidence that I have I expect that more is going on than that:
> 
> A) The hang-up always happens and always in the same place. So
> it would appear that no race is involved.
> 
> B) (A) is true even for varying the number of builders in parallel
> (so other builds also happening) and the number of jobs allowed per
> builder. It also fails for only one builder allowed only one process.
> (I get traces from that last kind of context.)
> 
> C) The problem started on the package-building servers for armv7
> and armv6 without qemu-user-static having an update (FreeBSD and
> cmake had updates, for example).
> 
> D) The problem is only observed for targeting armv7 and armv6 as
> far as I can tell. I've never seen it for aarch64, neither my
> own builds nor when I looked at the package-building server
> history.
> 
> At least that is what got me started. (I've since learned that
> qemu-user-static uses fork in place of a requested vfork.)
> 
> My ktrace/kdump experiment yesterday showed something odd for the
> kevent that hangs in cmake:
> 
> 93172 qemu-arm-static CALL  kevent(0x3,0x7ffffffe7d40,0x2,0x7ffffffd7d40,0x400,0)
> 93172 qemu-arm-static STRU  struct kevent[] = { { ident=6, filter=EVFILT_READ, flags=0x1<EV_ADD>, fflags=0, data=0, udata=0x0 }
>             { ident=0x0, filter=<invalid=0>, flags=0, fflags=0x8, data=0x1ffff, udata=0x0 } }
> 
> Note the 0x2 argument to kevent and the apparently-odd 2nd entry in the struct
> kevent[]. The kevent use is from cmake.
> 
> So far I've not identified a signal being delivered at a time that would seem
> to me to be likely to contribute. (But this is not familiar code so my judgment
> is likely not the best.)
> 
> Note: I normally run FreeBSD using a non-debug kernel, even when using
> head. (The kernel does have symbols.)


The detail of the signal usage involved leading up to the hang-up,
starting from just before the "press return" for the "make FLAVOR=qt5"
command that I had entered:

The only "Interrupted system call" prior to my killing the hung cmake
process was (kdump -H -r -S output):

 93172 100717 qemu-arm-static CALL  execve[59](0x10392,0x8605051a0,0x860cf5400)
 93172 101706 qemu-arm-static RET   nanosleep[240] -1 errno 4 Interrupted system call
 93172 100717 qemu-arm-static NAMI  "/bin/sh"
 93172 100717 sh       RET   execve[59] JUSTRETURN
 93172 100717 sh       CALL  readlink[58](0x207a65,0x7fffffffccc0,0x400)

This is where ninja (via qemu-arm-static) execve's the amd64-native /bin/sh (to
in turn later run cmake via qemu-arm-static). (This was after the fork [for the
requested vfork].) So it is for the close-down of the thread that was in
nanosleep.

There were no PSIG's and no sigreturn's prior to the kill according to the
kdump output.


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
Received on Sat Dec 29 2018 - 02:07:04 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:19 UTC