On 2018-Dec-28, at 12:12, Mark Millard <marklmi at yahoo.com> wrote: > On 2018-Dec-28, at 05:13, Michal Meloun <melounmichal at gmail.com> wrote: > >> Mark, >> this is known problem with qemu-user-static. >> Emulation of every single interruptible syscall is broken by design (it >> have signal related races). Theses races cannot be solved without major >> rewrite of syscall emulation code. >> Unfortunately, nobody actively works on this, I think. >> > > Thanks for the note setting some expectations. > > On the evidence that I have I expect that more is going on than that: > > A) The hang-up always happens and always in the same place. So > it would appear that no race is involved. > > B) (A) is true even for varying the number of builders in parallel > (so other builds also happening) and the number of jobs allowed per > builder. It also fails for only one builder allowed only one process. > (I get traces from that last kind of context.) > > C) The problem started on the package-building servers for armv7 > and armv6 without qemu-user-static having an update (FreeBSD and > cmake had updates, for example). > > D) The problem is only observed for targeting armv7 and armv6 as > far as I can tell. I've never seen it for aarch64, neither my > own builds nor when I looked at the package-building server > history. > > At least that is what got me started. (I've since learned that > qemu-user-static uses fork in place of a requested vfork.) > > My ktrace/kdump experiment yesterday showed something odd for the > kevent that hangs in cmake: > > 93172 qemu-arm-static CALL kevent(0x3,0x7ffffffe7d40,0x2,0x7ffffffd7d40,0x400,0) > 93172 qemu-arm-static STRU struct kevent[] = { { ident=6, filter=EVFILT_READ, flags=0x1<EV_ADD>, fflags=0, data=0, udata=0x0 } > { ident=0x0, filter=<invalid=0>, flags=0, fflags=0x8, data=0x1ffff, udata=0x0 } } > > Note the 0x2 argument to kevent and the apparently-odd 2nd entry in the struct > kevent[]. The kevent use is from cmake. > > So far I've not identified a signal being delivered at a time that would seem > to me to be likely to contribute. (But this is not familiar code so my judgment > is likely not the best.) > > Note: I normally run FreeBSD using a non-debug kernel, even when using > head. (The kernel does have symbols.) The detail of the signal usage involved leading up to the hang-up, starting from just before the "press return" for the "make FLAVOR=qt5" command that I had entered: The only "Interrupted system call" prior to my killing the hung cmake process was (kdump -H -r -S output): 93172 100717 qemu-arm-static CALL execve[59](0x10392,0x8605051a0,0x860cf5400) 93172 101706 qemu-arm-static RET nanosleep[240] -1 errno 4 Interrupted system call 93172 100717 qemu-arm-static NAMI "/bin/sh" 93172 100717 sh RET execve[59] JUSTRETURN 93172 100717 sh CALL readlink[58](0x207a65,0x7fffffffccc0,0x400) This is where ninja (via qemu-arm-static) execve's the amd64-native /bin/sh (to in turn later run cmake via qemu-arm-static). (This was after the fork [for the requested vfork].) So it is for the close-down of the thread that was in nanosleep. There were no PSIG's and no sigreturn's prior to the kill according to the kdump output. === Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)Received on Sat Dec 29 2018 - 02:07:04 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:19 UTC