Re: A head buildworld race visible in the ci.freebsd.org build history

From: Bryan Drewery <bdrewery_at_FreeBSD.org>
Date: Mon, 18 Jun 2018 16:29:05 -0700
On 6/18/2018 3:27 PM, Li-Wen Hsu wrote:
> On Mon, Jun 18, 2018 at 5:04 PM Mark Millard via freebsd-toolchain
> <freebsd-toolchain_at_freebsd.org> wrote:
>>
>> On 2018-Jun-18, at 12:42 PM, Bryan Drewery <bdrewery at FreeBSD.org> wrote:
>>
>>> On 6/15/2018 10:55 PM, Mark Millard wrote:
>>>> In watching ci.freebsd.org builds I've seen a notable
>>>> number of one time failures, such as (example from
>>>> powerpc64):
>>>>
>>>> --- all_subdir_lib/libufs ---
>>>> ranlib -D libufs.a
>>>> ranlib: fatal: Failed to open 'libufs.a'
>>>> *** [libufs.a] Error code 70
>>>>
>>>> where the next build works despite the change being
>>>> irrelevant to whatever ranlib complained about.
>>>>
>>>> Other builds failed similarly:
>>>>
>>>> --- all_subdir_lib/libbsm ---
>>>> ranlib -D libbsm_p.a
>>>> ranlib: fatal: Failed to open 'libbsm_p.a'
>>>> *** [libbsm_p.a] Error code 70
>>>>
>>>> and:
>>>>
>>>> --- kerberos5/lib__L ---
>>>> ranlib -D libgssapi_spnego_p.a
>>>> --- libgssapi_spnego.a ---
>>>> ranlib -D libgssapi_spnego.a
>>>> --- libgssapi_spnego_p.a ---
>>>> ranlib: fatal: Failed to open 'libgssapi_spnego_p.a'
>>>> *** [libgssapi_spnego_p.a] Error code 70
>>>>
>>>> and so on.
>>>>
>>>>
>>>> It is not limited to powerpc64. For example, for aarch64
>>>> there are:
>>>>
>>>> --- libpam_exec.a ---
>>>> building static pam_exec library
>>>> ar -crD libpam_exec.a `NM='nm' NMFLAGS=''  lorder pam_exec.o  | tsort -q`
>>>> ranlib -D libpam_exec.a
>>>> ranlib: fatal: Failed to open 'libpam_exec.a'
>>>> *** [libpam_exec.a] Error code 70
>>>>
>>>> and:
>>>>
>>>> --- all_subdir_lib/libusb ---
>>>> ranlib -D libusb.a
>>>> ranlib: fatal: Failed to open 'libusb.a'
>>>> *** [libusb.a] Error code 70
>>>>
>>>> and:
>>>>
>>>> --- all_subdir_lib/libbsnmp ---
>>>> ranlib: fatal: Failed to open 'libbsnmp.a'
>>>> --- all_subdir_lib/ncurses ---
>>>> --- all_subdir_lib/ncurses/panelw ---
>>>> --- panel.pico ---
>>>> --- all_subdir_lib/libbsnmp ---
>>>> *** [libbsnmp.a] Error code 70
>>>>
>>>>
>>>> Even amd64 gets such:
>>>>
>>>> --- libpcap.a ---
>>>> ranlib -D libpcap.a
>>>> ranlib: fatal: Failed to open 'libpcap.a'
>>>> *** [libpcap.a] Error code 70
>>>>
>>>> and:
>>>>
>>>>
>>>> --- libkafs5.a ---
>>>> ranlib: fatal: Failed to open 'libkafs5.a'
>>>> --- libkafs5_p.a ---
>>>> ranlib: fatal: Failed to open 'libkafs5_p.a'
>>>> --- cddl/lib__L ---
>>>> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/lua/lbaselib.c:60:26: note: include the header <ctype.h> or explicitly provide a declaration for 'toupper'
>>>> --- kerberos5/lib__L ---
>>>> *** [libkafs5_p.a] Error code 70
>>>>
>>>> make[5]: stopped in /usr/src/kerberos5/lib/libkafs5
>>>> --- libkafs5.a ---
>>>> *** [libkafs5.a] Error code 70
>>>>
>>>> and:
>>>>
>>>>
>>>> --- lib__L ---
>>>> ranlib -D libclang_rt.asan_cxx-i386.a
>>>> ranlib: fatal: Failed to open 'libclang_rt.asan_cxx-i386.a'
>>>> *** [libclang_rt.asan_cxx-i386.a] Error code 70
>>>>
>>>>
>>>> (Notice the variability in what .a the ranlib's fail for.)
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> I looked at this a few days ago and don't believe it's actually a build
>>> race. I think there is something wrong with the ar/ranlib on that system
>>> or something else. I've found no evidence of concurrent building of the
>>> .a files in question.
>>
>>
>> Looking at a bunch of the failures, spanning multiple
>> FreeBSD-head-*-build types of builds, I see only:
>>
>> NODE_LABELS     bhyve_host butler1.nyi.freebsd.org jailer jailer_fast
>> NODE_NAME       butler1.nyi.freebsd.org
>>
>> for the failures that I looked at.
>>
>> So your "on that system" might well be correct.
> 
> Thanks for the insight, the build is done in a 11.1-R jail on a
> -CURRENT host.  butler1.nyi is running r333388 (as a canary) while
> other builders are mostly running r328278.  I upgraded few others and
> it seems can reproduce the issue, and now I downgraded all the build
> slaves to r328278 before we find the root cause.
> 

The error is coming from libarchive which had a change between those
revisions:

> ------------------------------------------------------------------------
> r328332 | mm | 2018-01-24 06:24:17 -0800 (Wed, 24 Jan 2018) | 14 lines
> 
> MFV r328323,328324:
> Sync libarchive with vendor.
> 
> Relevant vendor changes:
>   PR #893: delete dead ppmd7 alloc callbacks
>   PR #904: Fix archive freeing bug in bsdcat
>   PR #961: Fix ZIP format names
>   PR #962: Don't modify attributes for existing directories
>            when ARCHIVE_EXTRACT_NO_OVERWRITE is set
>   PR #964: Fix -Werror=implicit-fallthrough= for GCC 7
>   PR #970: zip: Allow backslash as path separator
> 
> MFC after:      1 week
> 
> ------------------------------------------------------------------------

Nothing obvious stands out in the change to me though from a brief look.


-- 
Regards,
Bryan Drewery


Received on Mon Jun 18 2018 - 21:29:16 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:16 UTC