Re: Deadlock [PATCH]

From: Bruce Evans <bde_at_zeta.org.au> Date: Sun, 10 Aug 2003 23:15:22 +1000 (EST) · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:18 UTC

On Sun, 10 Aug 2003, Peter Holm wrote:

> I have tracked down, what I belive to be the cause of several
> deadlock situations I have encountered, like
> http://people.freebsd.org/~pho/stress/cons40.html.
>
> The problem seems to be the 4bsd scheduler, that does not preempt correctly.
>
> I've included a patch that fixes the problem for me.

% --- sched_4bsd.c~	Sun Jun 15 16:57:17 2003
% +++ sched_4bsd.c	Sun Aug 10 08:41:06 2003
% _at__at_ -448,7 +448,8 _at__at_
%
%  	ke->ke_sched->ske_cpticks++;
%  	kg->kg_estcpu = ESTCPULIM(kg->kg_estcpu + 1);
% -	if ((kg->kg_estcpu % INVERSE_ESTCPU_WEIGHT) == 0) {
% +	if (((kg->kg_estcpu + 1) % INVERSE_ESTCPU_WEIGHT) == 0) {
% +		curthread->td_flags |= TDF_NEEDRESCHED;
%  		resetpriority(kg);
%  		if (td->td_priority >= PUSER)
%  			td->td_priority = kg->kg_user_pri;

I wonder how this makes much difference.  Setting of the rescheduling
flag is supposed to be handled by resetpriority(), so the above at
most works around bugs in resetpriority().  There are plenty of bugs
in resetpriority(), but they are minor and mostly overcome by larger
bugs AFAIK.

resetpriority() was first broken in rev.1.48 of kern_synch.c 1999/03/04)
and the bugs have moved around a bit since then.  The main point of
kern_synch.c rev.1.48 was to prevent excessive context switches for
non-PRI_TIMESHARE scheduling classes so that POSIX roundrobin and fifo
scheduling classes could be implemented using these clasess.  This has
been broken in -current by context switching even more excessively to
switch to interrupt tasks.  The main bugs in kern_synch.c were that
it broke the PRI_TIMESHARE case by not passing the new priority to
maybe_resched() (still broken), and it didn't get the logic for the
non-PRI_TIMESHARE case right (now fixed).  The excessive context
switching in -current makes these bugs mostly moot -- we normally
reschedule on the next (non-clock) interrupt, so rescheduling is just
delayed a bit.  The delay is normally quite small too, at least in the
case that the rescheduling is triggered by sched_clock().  roundrobin()
is supposed to cause rescheduling every quantum (default 1/10 seconds).
sched_clock() calls resetpriority() every 1/16 seconds on i386's and
rescheduling 16 times per second isn't much different from rescheduling
10 time per second.  But sched_clock() should never cause rescheduling
since it should only decrease (numerically increase) the priority of
the current thread.

I think your deadlock must be related to roundrobin() never running,
which might be caused by the bugfeature that userret() doesn't consider
rescheduling.  roundrobin() is just a timeout routine, and its current
implementation depends on excessive context switching.  All timeout
routines are called from a low priority task, and switching to this
task causes roundrobin scheduling of user processes as a side effect.
roundrobin() just ensures that there is a timeout every quantum in
case the system doesn't otherwise generate timeouts frequently enough,
The bugfeature lets processes return to user mode even though their
user priority is much lower (numerically higher) than that of other
user tasks or perhaps even some kernel tasks.  They effectively run
with a high kernel priority until they give up control or are kicked
by an interrupt.  Normally there are enough interrupts to prevent
problems.  But -current has the curious behaviour of forcibly rescheduling
on almost all interrupts _except_ the clock (scheduling) one.  Clock
interrupts are fast, so they don't cause context switches.  Hardclock
interrupts schedule timeouts every quantum, but your bug apparently
blocks timeouts.  Softclock interrupts are not supposed to cause
rescheduling (that is roundrobin()'s job).  Your change does forcible
rescheduling for statclock interrupts.

In my version of FreeBSD, clock interrupt handlers are non-fast so they
do forcible context switching as a side effect and I probably wouldn't
notice this problem.  My clock interrupt tasks have a much higher
priority than the timeout task so deadlocks are less likely and would
be more obvious.

I'll try running your stress test.

Bruce