Re: Fix for some stress panics

From: Antoine Pelisse <apelisse_at_gmail.com> Date: Mon, 8 Aug 2005 15:24:01 +0200 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:40 UTC

On 8/8/05, Don Lewis <truckman_at_freebsd.org> wrote:
> 
> On 7 Aug, Antoine Pelisse wrote:
> > http://people.freebsd.org/~pho/stress/log/cons149.html
> > http://people.freebsd.org/~pho/stress/log/cons130.html
> >
> > I've been working on this panic today (the two are obviously
> > the same) and here is a patch to fix it:
> > --- sys/kern/kern_proc.c.orig Mon Apr 18 04:10:36 2005
> > +++ sys/kern/kern_proc.c Sun Aug 7 21:18:03 2005
> > _at__at_ -884,10 +884,8 _at__at_
> > _PHOLD(p);
> > FOREACH_THREAD_IN_PROC(p, td) {
> > fill_kinfo_thread(td, &kinfo_proc);
> > - PROC_UNLOCK(p);
> > error = SYSCTL_OUT(req, (caddr_t)&kinfo_proc,
> > sizeof(kinfo_proc));
> > - PROC_LOCK(p);
> > if (error)
> > break;
> > }
> >
> > As a matter of fact, if td is removed from the list through 
> thread_unlink
> > while
> > the mutex is released and the next thread is removed just after, the 
> FOREACH
> >
> > is looping through an unlinked list where the td_ksegrp has been set to 
> NULL
> >
> > by thread_exit.
> > If we absolutely have to release the lock, then it's probably safer to 
> check
> > if
> > td_ksegroup != NULL in the fill_kinfo_thread function.
> 
> Calling SYSCTL_OUT(), which calls copyout(), is not allowed while
> holding a mutex unless the userland buffer is wired. The reason is that
> if any of the userland buffer is not resident in RAM, the thread can
> block while the buffer is paged in, and this is not legal while holding
> a mutex.
> 
> There are two ways to fix this:
> 
> Allocate a temporary buffer of the appropriate size, grab the
> mutex, traverse the list and copy the data into the temporary
> buffer, release the mutex, call SYSCTL_OUT() to copy the
> contents of the temporary buffer to the user buffer, and free
> the temporary buffer.
> 
> Wire the appropriate amount of the userland buffer using
> sysctl_wire_old_buffer(), grab the mutex, traverse the list,
> copying each item to userland with SYSCTL_OUT(), release the
> mutex, and unwire the userland buffer.

Is that what you meant ?

--- sys/kern/kern_proc.c.orig Mon Apr 18 04:10:36 2005
+++ sys/kern/kern_proc.c Mon Aug 8 15:09:50 2005
_at__at_ -868,7 +868,7 _at__at_
{
struct thread *td;
struct kinfo_proc kinfo_proc;
- int error = 0;
+ int error, buffersize = 0;
struct proc *np;
pid_t pid = p->p_pid;

_at__at_ -883,11 +883,23 _at__at_
} else {
_PHOLD(p);
FOREACH_THREAD_IN_PROC(p, td) {
- fill_kinfo_thread(td, &kinfo_proc);
- PROC_UNLOCK(p);
+ buffersize += sizeof(struct kinfo_proc);
+ }
+ PROC_UNLOCK(p);
+
+ error = sysctl_wire_old_buffer(req, buffersize);
+ if (error)
+ {
+ _PRELE(p);
+ return error;
+ }
+
+ PROC_LOCK(p);
+ FOREACH_THREAD_IN_PROC(p, td) {
+ fill_kinfo_thread(td, &kinfo_proc);
error = SYSCTL_OUT(req, (caddr_t)&kinfo_proc,
sizeof(kinfo_proc));
- PROC_LOCK(p);
+
if (error)
break;
}

In either case, the appropriate buffer size must be calculated (possibly
> requiring the mutex to be grabbed and released) before executing an
> operation that could block, grabbing the mutex, and traversing the list.
> It is quite possible for the pre-calculated size to not match the actual
> size needed when the list is traversed to actually copy the data.
> 
> 
How can this issue be addressed ? Maybe some extra memory has to be wired ?