Re: calcru: negative time ... followed by freeze

From: Bruce Evans <bde_at_zeta.org.au>
Date: Mon, 21 Jun 2004 18:05:02 +1000 (EST)
On Mon, 21 Jun 2004, Julian Elischer wrote:

> On Mon, 21 Jun 2004, Bruce Evans wrote:
>
> > Ah, here is a likely cause of the bug in -current:
> >
> > % 	if (p == curthread->td_proc) {
> > % 		/*
> > % 		 * Adjust for the current time slice.  This is actually fairly
> > % 		 * important since the error here is on the order of a time
> > % 		 * quantum, which is much greater than the sampling error.
> > % 		 * XXXKSE use a different test due to threads on other
> > % 		 * processors also being 'current'.
> > % 		 */
> > % 		binuptime(&bt);
> > % 		bintime_sub(&bt, PCPU_PTR(switchtime));
> > % 		bintime_add(&bt, &p->p_runtime);
> > % 	} else
> > % 		bt = p->p_runtime;
> >
> > The XXXKSE comment is correct that this might be broken.  If the (p
> > != curthread->td_proc) case happens at all for a running process, then
> > it gives a wrong (out of date) timestamp in bt.  This wrongness will
> > be detected if calcru() is was called called earlier in the current
> > timeslice and took the other path here.
>
> It should be fairly easy as there is now a thread state that indicates
> that it is actually running now..

It's not so easy [to fix] since the switchtime for threads running on other
CPUs is inaccessible (it is in the CPU's pcpu data).

The bug seems to be unrelated to KSE.  It is related to SMP.  RELENG_4 has
the bug, and pre-KSE versions have a proc state that indicates if we have
a running process which can't be handled right.

I will turn off the check in the known broken case, and maybe change the
printf() to a log() since the error is not very important and syscons's
console output routine is suspect when called with sched_lock held.

Bruce
Received on Mon Jun 21 2004 - 06:05:36 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:58 UTC