Re: Fix for some stress panics

From: Don Lewis <truckman_at_FreeBSD.org>
Date: Mon, 8 Aug 2005 02:13:25 -0700 (PDT)
On  7 Aug, Antoine Pelisse wrote:
> http://people.freebsd.org/~pho/stress/log/cons149.html
> http://people.freebsd.org/~pho/stress/log/cons130.html
> 
> I've been working on this panic today (the two are obviously 
> the same) and here is a patch to fix it:
> --- sys/kern/kern_proc.c.orig Mon Apr 18 04:10:36 2005
> +++ sys/kern/kern_proc.c Sun Aug 7 21:18:03 2005
> _at__at_ -884,10 +884,8 _at__at_
> _PHOLD(p);
> FOREACH_THREAD_IN_PROC(p, td) {
> fill_kinfo_thread(td, &kinfo_proc);
> - PROC_UNLOCK(p);
> error = SYSCTL_OUT(req, (caddr_t)&kinfo_proc,
> sizeof(kinfo_proc));
> - PROC_LOCK(p);
> if (error)
> break;
> }
> 
> As a matter of fact, if td is removed from the list through thread_unlink 
> while
> the mutex is released and the next thread is removed just after, the FOREACH 
> 
> is looping through an unlinked list where the td_ksegrp has been set to NULL 
> 
> by thread_exit.
> If we absolutely have to release the lock, then it's probably safer to check 
> if 
> td_ksegroup != NULL in the fill_kinfo_thread function.

Calling SYSCTL_OUT(), which calls copyout(), is not allowed while
holding a mutex unless the userland buffer is wired.  The reason is that
if any of the userland buffer is not resident in RAM, the thread can
block while the buffer is paged in, and this is not legal while holding
a mutex.

There are two ways to fix this:

	Allocate a temporary buffer of the appropriate size, grab the
	mutex, traverse the list and copy the data into the temporary
        buffer, release the mutex, call SYSCTL_OUT() to copy the
        contents of the temporary buffer to the user buffer, and free
        the temporary buffer.

	Wire the appropriate amount of the userland buffer using
        sysctl_wire_old_buffer(), grab the mutex, traverse the list,
        copying each item to userland with SYSCTL_OUT(), release the
        mutex, and  unwire the userland buffer.

In either case, the appropriate buffer size must be calculated (possibly
requiring the mutex to be grabbed and released) before executing an
operation that could block, grabbing the mutex, and traversing the list.
It is quite possible for the pre-calculated size to not match the actual
size needed when the list is traversed to actually copy the data.
Received on Mon Aug 08 2005 - 07:13:33 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:40 UTC