Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved)

From: Michal Meloun <melounmichal_at_gmail.com>
Date: Fri, 28 Dec 2018 14:13:25 +0100
On 24.12.2018 8:28, Mark Millard wrote:
> [I built a FreeBSD head -r340288 context and tried ports head
> -r484783 and the problem repeated.]
> 
> On 2018-Dec-22, at 12:55, Mark Millard <marklmi at yahoo.com> wrote:
> 
>> [I found my E-mail records reporting successful builds using
>> qemu-user-static from ports head -r484783 under FreeBSD
>> head -r340287.]
>>
>> On 2018-Dec-22, at 00:10, Mark Millard <marklmi at yahoo.com> wrote:
>>
>>> [I messed up the freebsd-emulation email address the first time I sent
>>> this. I also forgot to indicate the qemu-user-static vintage relationship.]
>>>
>>> I had been reporting intermittent hang-ups for my amd64->{aarch64,armv7} port cross
>>> builds in another message sequence. But it turns out that one thing I ran into
>>> has hung-up every time, the same way, for amd64->armv7 cross builds:
>>> multimedia/gstreamer1-qt_at_qt5 . So I extract the material here into a separate report
>>> with some updated notes.
>>>
>>> A little context: I had built from ports head -r484783 before under FreeBSD head
>>> -r340287 (as I remember the version). Back then it did not have this problem that it
>>> now has under FreeBSD head -r341836 . One ports-specific change was to force perl5.28
>>> as the default instead of perl5.26 originally. In fact this is what drives what is
>>> being rebuilt for my experiment that caught this. But I doubt the perl version is
>>> important to the problem. The context has a Ryzen Threadripper 1950X and has been
>>> tested both for FreeBSD under Hyper-V and for the same media native-booted. Both
>>> hang-up at the same point as seen via ps or top. The native tools for cross-build
>>> speedup were in use. Cross-builds targeting aarch64 did not get this problem but
>>> targeting armv7 did. 121 of 129 armv7 ports built before the hang-up for the first
>>> armv7 try.
>>>
>>> ADDED: The qemu-user-static back with head -r340287 before installing the
>>> updated ports would likely be different than the -r484783 vintage. So both
>>> FreeBSD and qemu-user-static may have changed over the comparison.
>>
>> CORRECTION to ADDED: Back on 2018-Nov-11 I reported successful cross-builds
>> based on qemu-user-static from ports head -484783 --all built under FreeBSD
>> head -r340287 . So the use of the perl5.28 as the forced-default and the
>> newer FreeBSD head version -r341836 as the context are the differences here.
>>
>>> The hang-up:
>>>
>>> In the port rebuilds targeting armv7, multimedia/gstreamer1-qt_at_qt5 hung-up and timed
>>> out. Looking during the wait in later tries shows something much like (from one of the
>>> examples):
>>>
>>> root       33719    0.0  0.0  12920  3528  0  I    11:40       0:00.03 | |           `-- sh: poudriere[FBSDFSSDjailArmV7-default][02]: build_pkg (gstreamer1-qt5-1.2.0_14) (sh)
>>> root       41551    0.0  0.0  12920  3520  0  I    11:43       0:00.00 | |             `-- sh: poudriere[FBSDFSSDjailArmV7-default][02]: build_pkg (gstreamer1-qt5-1.2.0_14) (sh)
>>> root       41552    0.0  0.0  10340  1744  0  IJ   11:43       0:00.01 | |               `-- /usr/bin/make -C /usr/ports/multimedia/gstreamer1-qt FLAVOR=qt5 build
>>> root       41566    0.0  0.0  10236  1796  0  IJ   11:43       0:00.00 | |                 `-- /bin/sh -e -c (cd /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.build; if ! /usr/bin/env QT_SELE
>>> root       41567    0.0  0.0  89976 12896  0  IJ   11:43       0:00.07 | |                   `-- /usr/local/bin/qemu-arm-static ninja -j28 -v all
>>> root       41585    0.0  0.0 102848 25056  0  IJ   11:43       0:00.10 | |                     |-- /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen /wrkdirs/usr/ports/multimedia/g
>>> root       41586    0.0  0.0 102852 25072  0  IJ   11:43       0:00.11 | |                     `-- /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen /wrkdirs/usr/ports/multimedia/g
>>>
>>> or as top showed it:
>>>
>>> 41552 root          1  52    0    10M  1744K    0 wait    15   0:00   0.00% /usr/bin/make -C /usr/ports/multimedia/gstreamer1-qt FLAVOR=qt5 build
>>> 41566 root          1  52    0    10M  1796K    0 wait     1   0:00   0.00% /bin/sh -e -c (cd /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.build; if ! /usr/bin/env QT_SELECT=qt5 QMAKEMODULES
>>> 41567 root          2  52    0    88M    13M    0 select   4   0:00   0.00% /usr/local/bin/qemu-arm-static ninja -j28 -v all
>>> 41585 root          2  52    0   100M    24M    0 kqread   8   0:00   0.00% /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.
>>> 41586 root          2  52    0   100M    24M    0 kqread  22   0:00   0.00% /usr/local/bin/qemu-arm-static /usr/local/bin/cmake -E cmake_autogen /wrkdirs/usr/ports/multimedia/gstreamer1-qt/work-qt5/.
>>>
>>> So: waiting in kqread trying to run cmake.
>>>
>>> Unlike some intermittent hang-ups, attaching-then-detaching via gdb does not
>>> resume the hung-up processes. Kills of the processes waiting on kqread stop
>>> the build.
>>>
>>> Given the prior ports have been built already, building just
>>> multimedia/gstreamer1-qt_at_qt5 still gets the hang-up at the same point.
>>>
>>> Building anything that requires multimedia/gstreamer1-qt_at_qt5 seems to be
>>> solidly blocked in my environment.
> 
> 
> I built a FreeBSD head -r340288 context and tried cross-buiding an
> amd64->armv7 ports head -r484783 of my usual ports and the problem
> repeated. I also found evidence that originally in the old time frame
> I'd disabled part of my originally-intended port builds because of
> other problems so multimedia/gstreamer1-qt 's build was not being
> tried.
> 
> So the qemu-user-static vintage or content may be what to vary to
> narrow down the problem instead of bisecting FreeBSD kernel or world
> vintages. clang7 building qemu-user-static or the kernel/world has
> been eliminated.
> 
> 
> (I used -r340288 to match a artifact.ci.freebsd.org build, incorrectly
> expecting to bisect via kernel substitutions.)
> 

Mark,
this is known problem with qemu-user-static.
Emulation of every single interruptible syscall is broken by design (it
have signal related races). Theses races cannot be solved without major
rewrite of syscall emulation code.
Unfortunately, nobody actively works on this, I think.

Michal
Received on Fri Dec 28 2018 - 12:13:27 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:19 UTC