Re: sysctl spinning (was: Re: ps Causes Hard Hang)

From: Don Lewis <truckman_at_FreeBSD.org>
Date: Thu, 4 Mar 2004 01:38:30 -0800 (PST)
On  3 Mar, Robert Watson wrote:
> 
> On Wed, 3 Mar 2004, Cy Schubert wrote:
> 
>> I'm running 5 -CURRENT systems. My firewall system, using IPF, hard
>> hangs every time ps is entered -- totally unresponsive, requiring either
>> a power cycle or reset switch to bring it back to life. 
>> 
>> Before I start digging into this seriously I'd like to possibly get info
>> from anyone who may have experienced this before. 
> 
> Alan Cox and I have both experienced this -- it's actually only a hard
> hang if you're trying to use the syscons break to debugger, serial break
> to debugger can get into DDB fine.  It looks like the sysctl code is
> spinning in kernel, possibly due to looping waiting for a response other
> than EAGAIN.  I'm wonder if it was the recent limits on locked memory
> changes in sysctl, although at first we thought it might be the sleepq
> changes (seems less likely now).  Because sysctl holds Giant, the other
> CPUs are locked out of Giant-protected bits of the kernel (many of them),
> including Syscons.

That sounds quite possible, though I would only expect it to happen if
userland passed a large output buffer to the sysctl call.  In the
current implementation, EAGAIN will only be returned when this condition
is true:

        if (atop(size) + cnt.v_wire_count > vm_page_max_wired)
                return (EAGAIN);

Hmn, it looks like vm_page_max_wired is dynamically set to one third of
free system memory in vm_pageout().

        /* XXX does not really belong here */
        if (vm_page_max_wired == 0)
                vm_page_max_wired = cnt.v_free_count / 3;

I was under the impression that it was one third of physical memory.

I think there are three problems here:

	vm_page_max_wired is probably the wrong value.

	The sysctl code should not do a tight loop on an EAGAIN error.

	The sysctl handlers that wire memory should actually provide
	estimates of the amount of memory that needs to be wired.

Should the failure to wire the buffer be mapped to a different errno?
There may be cases when it is valid to retry the request.

The code that loops on EAGAIN was added in the rev 1.63 of
kern_sysctl.c.
Received on Thu Mar 04 2004 - 00:38:48 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:46 UTC