Re: Sleeping thread (tid 100033, pid 16): panic in FreeBSD 10.0-CURRENT/amd64 r228662

From: Attilio Rao <attilio_at_freebsd.org>
Date: Tue, 20 Dec 2011 15:20:09 +0100
2011/12/20 John Baldwin <jhb_at_freebsd.org>:
> On Saturday, December 17, 2011 10:41:15 pm mdf_at_freebsd.org wrote:
>> On Sat, Dec 17, 2011 at 5:45 PM, Alexander Kabaev <kabaev_at_gmail.com> wrote:
>> > On Sun, 18 Dec 2011 01:09:00 +0100
>> > "O. Hartmann" <ohartman_at_zedat.fu-berlin.de> wrote:
>> >
>> >> Sleeping thread (tid 100033, pid 16) owns a non sleepable lock
>> >> panic: sleeping thread
>> >> cpuid = 0
>> >>
>> >> PID 16 is always USB on my box.
>> >
>> > You really need to give us a backtrace when you quote panics. It is
>> > impossible to make any sense of the above panic message without more
>> > context.
>>
>> In the case of this panic, the stack of the thread which panics is
>> useless; it's someone trying to propagate priority that discovered it.
>>  A backtrace on tid 100033 would be useful.
>>
>> With WITNESS enabled, it's possible to have this panic display the
>> stack of the incorrectly sleeping thread at the time it acquired the
>> lock, as well, but this code isn't in CURRENT or any release.  I have
>> a patch at $WORK I can dig up on Monday.
>
> Huh?  The stock kernel dumps a stack trace of the offending thread if you have
> DDB enabled:
>
>                /*
>                 * If the thread is asleep, then we are probably about
>                 * to deadlock.  To make debugging this easier, just
>                 * panic and tell the user which thread misbehaved so
>                 * they can hopefully get a stack trace from the truly
>                 * misbehaving thread.
>                 */
>                if (TD_IS_SLEEPING(td)) {
>                        printf(
>                "Sleeping thread (tid %d, pid %d) owns a non-sleepable lock\n",
>                            td->td_tid, td->td_proc->p_pid);
> #ifdef DDB
>                        db_trace_thread(td, -1);
> #endif
>                        panic("sleeping thread");
>                }
>
> It may be that we can make use of the STACK API here instead to output this
> trace even when DDB isn't enabled.  The patch below tries to do that
> (untested).  It does some odd thigns though since it is effectively running
> from a panic context already, so it uses a statically allocated 'struct stack'
> rather than using stack_create() and uses stack_print_ddb() since it is
> holding spin locks and can't possibly grab an sx lock:
>
> Index: subr_turnstile.c
> ===================================================================
> --- subr_turnstile.c    (revision 228534)
> +++ subr_turnstile.c    (working copy)
> _at__at_ -72,6 +72,7 _at__at_ __FBSDID("$FreeBSD$");
>  #include <sys/proc.h>
>  #include <sys/queue.h>
>  #include <sys/sched.h>
> +#include <sys/stack.h>
>  #include <sys/sysctl.h>
>  #include <sys/turnstile.h>
>
> _at__at_ -175,6 +176,7 _at__at_ static void turnstile_fini(void *mem, int size);
>  static void
>  propagate_priority(struct thread *td)
>  {
> +       static struct stack st;
>        struct turnstile *ts;
>        int pri;
>
> _at__at_ -217,8 +219,10 _at__at_ propagate_priority(struct thread *td)
>                        printf(
>                "Sleeping thread (tid %d, pid %d) owns a non-sleepable lock\n",
>                            td->td_tid, td->td_proc->p_pid);
> -#ifdef DDB
> -                       db_trace_thread(td, -1);
> +#ifdef STACK
> +                       stack_zero(&st);
> +                       stack_save_td(&st, td);
> +                       stack_print_ddb(&st);
>  #endif
>                        panic("sleeping thread");
>                }
>
> --

I'm not sure it is a wise idea to trimm the DDB part, because it is
much more common than STACK enabled. Note that stack(9) is working if
you define DDB too, so I'd say to do that for both.
Also, I don't think you need the stack_zero() prior to set it.

As we are here, however, I have a question for Robert here: do you
think we should support the _ddb() variant of options even in the case
DDB is not enabled in the kernel?
Probabilly the way it is nowadays is easier to have stack(9) both
defined for DDB and STACK, but in general I wouldn't advertise that.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
Received on Tue Dec 20 2011 - 13:20:12 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:22 UTC