On Thu, Jun 21, 2018 at 10:49 PM Mark Millard <marklmi_at_yahoo.com> wrote: > Has the range r328278 < PROBLEM_START <= r330304 been narrowed down > some more? > > (I'm just curious were the problem started.) After several rounds of binary search, I found it might have something todo with r329625. The only thing I think this commit related to the situation we met is it touched the code for doing unmount. But I cannot confirm if it is the cause. It is a bit tricky to reproduce. I will try to keep it concise. We do builds for head in a jail (11.2-RELEASE) on a -CURRENT host. The jail is on a dedicated zfs. And there is a daemon doing jail/zfs cleanup running outside of the jail. In some edge cases, that cleanup daemon wants to destroy the zfs of the jail in which a build is still running. If that happens, with an earlier -CURRENT, it should just get "cannot unmount '/jenkins/jails/test-ranlib': Device busy" and nothing serious will happen. Recently, although it still didn't destroy the busy zfs, it started causing build error out with "ranlib: fatal: Failed to open 'libXXX.a'" To reproduce this, create a zfs and use that as the root of a jail, run this build script under /usr/src inside the jail: https://gist.github.com/lwhsu/ae3b8b1f0c856837f93984ab2493f629#file-build-sh Run this cleanup script on the host: https://gist.github.com/lwhsu/ae3b8b1f0c856837f93984ab2493f629#file-clean-test-ranlib-sh (need to modify the zfs path) I use powerpcspe as TARGET_ARCH here because it takes a shorter time in one iteration. There should be nothing related to the architectures. I am not very sure about what is the next step, maybe modifying ranlib and log more what it gets "fatal: Failed to open 'libxxx.a'" Any good idea about debugging this? Li-wen -- Li-Wen Hsu <lwhsu_at_FreeBSD.org> https://lwhsu.orgReceived on Tue Aug 07 2018 - 00:29:52 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:17 UTC