Re: Stop scheduler on panic

From: Kostik Belousov <kostikbel_at_gmail.com> Date: Thu, 17 Nov 2011 20:32:49 +0200 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:20 UTC

On Thu, Nov 17, 2011 at 02:40:45PM +0200, Alexander Motin wrote:
> On 11/17/11 13:14, Kostik Belousov wrote:
> > On Thu, Nov 17, 2011 at 11:29:02AM +0200, Alexander Motin wrote:
> >> On 11/17/11 11:06, Kostik Belousov wrote:
> >>> On Thu, Nov 17, 2011 at 10:40:58AM +0200, Alexander Motin wrote:
> >>>> On 11/17/11 10:15, Kostik Belousov wrote:
> >>>>> On Thu, Nov 17, 2011 at 01:07:38AM +0200, Alexander Motin wrote:
> >>>>>> On 17.11.2011 00:21, Andriy Gapon wrote:
> >>>>>>> on 16/11/2011 21:27 Fabian Keil said the following:
> >>>>>>>> Kostik Belousov<kostikbel_at_gmail.com>  wrote:
> >>>>>>>>
> >>>>>>>>> I was tricked into finishing the work by Andrey Gapon, who developed
> >>>>>>>>> the patch to reliably stop other processors on panic.  The patch
> >>>>>>>>> greatly improves the chances of getting dump on panic on SMP host.
> >>>>>>>>
> >>>>>>>> I tested the patch trying to get a dump (from the debugger) for
> >>>>>>>> kern/162036, which currently results in the double fault reported in:
> >>>>>>>> http://lists.freebsd.org/pipermail/freebsd-current/2011-September/027766.html
> >>>>>>>>
> >>>>>>>> It didn't help, but also didn't make anything worse.
> >>>>>>>>
> >>>>>>>> Fabian
> >>>>>>>
> >>>>>>> The mi_switch recursion looks very familiar to me:
> >>>>>>> mi_switch() at mi_switch+0x270
> >>>>>>> critical_exit() at critical_exit+0x9b
> >>>>>>> spinlock_exit() at spinlock_exit+0x17
> >>>>>>> mi_switch() at mi_switch+0x275
> >>>>>>> critical_exit() at critical_exit+0x9b
> >>>>>>> spinlock_exit() at spinlock_exit+0x17
> >>>>>>> [several pages of the previous three lines skipped]
> >>>>>>> mi_switch() at mi_switch+0x275
> >>>>>>> critical_exit() at critical_exit+0x9b
> >>>>>>> spinlock_exit() at spinlock_exit+0x17
> >>>>>>> intr_even_schedule_thread() at intr_event_schedule_thread+0xbb
> >>>>>>> ahci_end_transaction() at ahci_end_transaction+0x398
> >>>>>>> ahci_ch_intr() at ahci_ch_intr+0x2b5
> >>>>>>> ahcipoll() at ahcipoll+0x15
> >>>>>>> xpt_polled_action() at xpt_polled_action+0xf7
> >>>>>>>
> >>>>>>> In fact I once discussed with jhb this recursion triggered from a different
> >>>>>>> place.  To quote myself:
> >>>>>>> <avg>    spinlock_exit ->  critical_exit ->  mi_switch ->  kdb_switch ->
> >>>>>>> thread_unlock ->  spinlock_exit ->  critical_exit ->  mi_switch ->  ...
> >>>>>>> <avg>    in the kdb context
> >>>>>>> <avg>    this issue seems to be triggered by td_owepreempt being true at 
> >>>>>>> the time
> >>>>>>> kdb is entered
> >>>>>>> <avg>    and there of course has to be an initial spinlock_exit call 
> >>>>>>> somewhere
> >>>>>>> <avg>    in my case it's because of usb keyboard
> >>>>>>> <avg>    I wonder if it would make sense to clear td_owepreempt right 
> >>>>>>> before
> >>>>>>> calling kdb_switch in mi_switch
> >>>>>>> <avg>    instead of in sched_switch()
> >>>>>>> <avg>    clearing td_owepreempt seems like a scheduler-independent 
> >>>>>>> operation to me
> >>>>>>> <avg>    or is it better to just skip locking in usb when kdb_active is set
> >>>>>>> <avg>    ?
> >>>>>>>
> >>>>>>> The workaround described above should work in this case.
> >>>>>>> Another possibility is to pessimize mtx_unlock_spin() implementations to 
> >>>>>>> check
> >>>>>>> SCHEDULER_STOPPED() and to bypass any further actions in that case.  But 
> >>>>>>> that
> >>>>>>> would add unnecessary overhead to the sunny day code paths.
> >>>>>>>
> >>>>>>> Going further up the stack one can come up with the following proposals:
> >>>>>>> - check SCHEDULER_STOPPED() swi_sched() and return early
> >>>>>>> - do not call swi_sched() from xpt_done() if we somehow know that we are 
> >>>>>>> in a
> >>>>>>> polling mode
> >>>>>>
> >>>>>> There is no flag in CAM now to indicate polling mode, but if needed, it 
> >>>>>> should not be difficult to add one and not call swi_sched().
> >>>>>
> >>>>> I have the following change for eons on my test boxes. Without it,
> >>>>> I simply cannot get _any_ dump.
> >>>>>
> >>>>> diff --git a/sys/cam/cam_xpt.c b/sys/cam/cam_xpt.c
> >>>>> index 10b89c7..a38e42f 100644
> >>>>> --- a/sys/cam/cam_xpt.c
> >>>>> +++ b/sys/cam/cam_xpt.c
> >>>>> _at__at_ -4230,7 +4230,7 _at__at_ xpt_done(union ccb *done_ccb)
> >>>>>  			TAILQ_INSERT_TAIL(&cam_simq, sim, links);
> >>>>>  			mtx_unlock(&cam_simq_lock);
> >>>>>  			sim->flags |= CAM_SIM_ON_DONEQ;
> >>>>> -			if (first)
> >>>>> +			if (first && panicstr == NULL)
> >>>>>  				swi_sched(cambio_ih, 0);
> >>>>>  		}
> >>>>>  	}
> >>>>
> >>>> That should be OK for kernel dumping. I was thinking about CAM abusing
> >>>> polling not only for dumping. But looking on cases where it does it now,
> >>>> may be it is better to rewrite them instead.
> >>>
> >>> So, should I interpret your response as 'Reviewed by" ?
> >>
> >> It feels somehow dirty to me. I don't like these global variables. If
> >> you consider it is fine, proceed, I see no much harm. But if not, I can
> >> add polling flag to the CAM. Flip a coin for me. :)
> > You promised to add the polling at summer' meeting in Kiev. Will you do
> > it now ?
> 
> Sorry, I've probably forgot. The patch is attached.
> 
> -- 
> Alexander Motin

> Index: cam_sim.h
> ===================================================================
> --- cam_sim.h	(revision 227580)
> +++ cam_sim.h	(working copy)
> _at__at_ -104,7 +104,8 _at__at_
>  	u_int32_t		flags;
>  #define	CAM_SIM_REL_TIMEOUT_PENDING	0x01
>  #define	CAM_SIM_MPSAFE			0x02
> -#define CAM_SIM_ON_DONEQ		0x04
> +#define	CAM_SIM_ON_DONEQ		0x04
> +#define	CAM_SIM_POLLED			0x08
>  	struct callout		callout;
>  	struct cam_devq 	*devq;	/* Device Queue to use for this SIM */
>  	int			refcount; /* References to the SIM. */
> Index: cam_xpt.c
> ===================================================================
> --- cam_xpt.c	(revision 227580)
> +++ cam_xpt.c	(working copy)
> _at__at_ -2957,6 +2957,9 _at__at_
>  
>  	mtx_assert(sim->mtx, MA_OWNED);
>  
> +	/* Don't use ISR for this SIM while polling. */
> +	sim->flags |= CAM_SIM_POLLED;
> +
>  	/*
>  	 * Steal an opening so that no other queued requests
>  	 * can get it before us while we simulate interrupts.
> _at__at_ -2996,6 +2999,9 _at__at_
>  	} else {
>  		start_ccb->ccb_h.status = CAM_RESRC_UNAVAIL;
>  	}
> +
> +	/* Use CAM ISR for this SIM again. */
> +	sim->flags &= ~CAM_SIM_POLLED;
>  }
>  
>  /*
> _at__at_ -4224,7 +4230,7 _at__at_
>  		TAILQ_INSERT_TAIL(&sim->sim_doneq, &done_ccb->ccb_h,
>  		    sim_links.tqe);
>  		done_ccb->ccb_h.pinfo.index = CAM_DONEQ_INDEX;
> -		if ((sim->flags & CAM_SIM_ON_DONEQ) == 0) {
> +		if ((sim->flags & (CAM_SIM_ON_DONEQ | CAM_SIM_POLLED)) == 0) {
>  			mtx_lock(&cam_simq_lock);
>  			first = TAILQ_EMPTY(&cam_simq);
>  			TAILQ_INSERT_TAIL(&cam_simq, sim, links);

Seems it worked for me, on the same machine where I absolutely needed
my change before.

Thanks.