Re: 5.3-RELEASE TODO - make/kqueue

From: Garance A Drosihn <drosih_at_rpi.edu>
Date: Sat, 28 Aug 2004 20:05:05 -0400
At 7:37 AM -0600 8/27/04, Scott Long wrote:
>
>Testing focuses for 5.3-RELEASE

And update on Issue:

>  |---------------------------------+
>  | make -DUSE_KQUEUE causes lockup |
>  | with buildworld -jBIGNUM        |
>  |---------------------------------+

The description says:

>  |-------------------+---------------+--------------+------------|
>  |  Attempts to use make(1) with KQueues appears to result in a  |
>  |  kernel hang under "heavy load". It would be desirable to fix |
>  |  this both from the perspective of building FreeBSD quickly   |
>  |  as a developer, but also because it's an instability that    |
>  |  could show up under other high load and heavy use of         |
>  |  KQueues. See PR kern/57945 for a proposed patch and details. |
>  |  This appear to be the product of a locking problem, and must |
>  |  be fixed for 5.3.                                            |
>  |-------------------+---------------+--------------+------------|

I have done many buildworlds using the WITH_KQUEUE make over the
past week.  I have done at least 50 buildworlds in my dual-proc
Althon machine, with -j ranging from 3 to 15.  I have not seen any
lockups since the fix for IPI deadlocks went in.

I do still get the "*** Signal 6"s, even though I am now running
with v1.76 of src/sys/kern/kern_lock.c.  Actually I had updated
that one source file, expecting to get revision 1.75 (and thus
backout revision 1.74), as recently mentioned by Doug White.  I
just now realized that I ended up with 1.76...  I guess I should
try it one more time with 1.75 instead of 1.76.

One observation which is perhaps interesting.  I also modified
sys/kern/kern_sig.c so that it prints out a message to the console
whenever kill() or killpg1() is called with a SIGABRT.  I tested
that change, and it seems to work correctly with programs caling
kill(SIGABRT), abort(), or raise(SIGABORT).  However, when my
buildworld dies with `make' claiming it saw a Signal 6, these
printf's in kern_sig.c are never triggered.

This failure is "eventually repeatable" for me, in that I can
trigger it within 10 buildworlds.  And *seems* that it only
happens if I am also running a "folding-at-home" client at the
same time.  That client program is a Linux ELF binary, so maybe
that is significant.   Or maybe it's a red herring.

-- 
Garance Alistair Drosehn            =   gad_at_gilead.netel.rpi.edu
Senior Systems Programmer           or  gad_at_freebsd.org
Rensselaer Polytechnic Institute    or  drosih_at_rpi.edu
Received on Sat Aug 28 2004 - 22:05:10 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:09 UTC