Re: witness_lock_list_get: witness exhausted

From: Mateusz Guzik <mjguzik_at_gmail.com>
Date: Tue, 9 Jan 2018 01:31:19 +0100
On Tue, Jan 9, 2018 at 12:41 AM, Michael Jung <mikej_at_mikej.com> wrote:

> On 2018-01-08 13:39, John Baldwin wrote:
>
>> On Tuesday, November 28, 2017 02:46:03 PM Michael Jung wrote:
>>
>>> Hi!
>>>
>>> I've recently up'd my processor count on our poudriere box and have
>>> started noticing the error
>>> "witness_lock_list_get: witness exhausted" on the console.  The kernel
>>> *DOES NOT* crash but I
>>> thought the report may be useful to someone.
>>>
>>> $ uname -a
>>> FreeBSD poudriere 12.0-CURRENT FreeBSD 12.0-CURRENT #1 r325999: Sun Nov
>>> 19 18:41:20 EST 2017
>>> mikej_at_poudriere:/usr/obj/usr/src/amd64.amd64/sys/GENERIC  amd64
>>>
>>> The machine is pretty busy running four poudriere build instances.
>>>
>>> last pid: 76584;  load averages: 115.07, 115.96, 98.30
>>>
>>>                                       up 6+07:32:59  14:44:03
>>> 763 processes: 117 running, 581 sleeping, 2 zombie, 63 lock
>>> CPU: 59.0% user,  0.0% nice, 40.7% system,  0.1% interrupt,  0.1% idle
>>> Mem: 12G Active, 2003M Inact, 44G Wired, 29G Free
>>> ARC: 28G Total, 11G MFU, 16G MRU, 122M Anon, 359M Header, 1184M Other
>>>       25G Compressed, 32G Uncompressed, 1.24:1 Ratio
>>>
>>> Let me know what additional information I might supply.
>>>
>>
>> This just means that WITNESS stopped working because it ran out of
>> pre-allocated objects.  In particular the objects used to track how
>> many locks are held by how many threads:
>>
>> /*
>>  * XXX: This is somewhat bogus, as we assume here that at most 2048
>> threads
>>  * will hold LOCK_NCHILDREN locks.  We handle failure ok, and we should
>>  * probably be safe for the most part, but it's still a SWAG.
>>  */
>> #define LOCK_NCHILDREN  5
>> #define LOCK_CHILDCOUNT 2048
>>
>> Probably the '2048' (max number of concurrent threads) needs to scale with
>> MAXCPU.  2048 threads is probably a bit low on big x86 boxes.
>>
>
>
> Thank you for you explanation.  We are expanding our ESXi cluster and even
> though with standard edition I can only assign 64 vCPU's to a guest and as
> much
> RAM as I want, I do like to help with edge cases if I can make them occur
> pushing
> boundaries as I can towards additianional improvements in FreeBSD.
>

Can you apply this and re-run the test?

https://people.freebsd.org/~mjg/witness.diff

It bumps the counters to be "high enough" but also starts tracking usage.
If you get
the message again, bump the values even higher.

Once you get a complete poudriere run which did not result in the problem,
do:
$ sysctl debug.witness.list_used debug.witness.list_max_used

to dump the actual usage.

-- 
Mateusz Guzik <mjguzik gmail.com>
Received on Mon Jan 08 2018 - 23:31:22 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:14 UTC