Re: Sleeping thread (tid 100033, pid 16): panic in FreeBSD 10.0-CURRENT/amd64 r228662

From: <mdf_at_FreeBSD.org>
Date: Tue, 20 Dec 2011 08:42:25 -0800
On Tue, Dec 20, 2011 at 6:32 AM, John Baldwin <jhb_at_freebsd.org> wrote:
> On Tuesday, December 20, 2011 9:22:48 am mdf_at_freebsd.org wrote:
>> On Tue, Dec 20, 2011 at 5:52 AM, John Baldwin <jhb_at_freebsd.org> wrote:
>> > On Saturday, December 17, 2011 10:41:15 pm mdf_at_freebsd.org wrote:
>> >> On Sat, Dec 17, 2011 at 5:45 PM, Alexander Kabaev <kabaev_at_gmail.com> wrote:
>> >> > On Sun, 18 Dec 2011 01:09:00 +0100
>> >> > "O. Hartmann" <ohartman_at_zedat.fu-berlin.de> wrote:
>> >> >
>> >> >> Sleeping thread (tid 100033, pid 16) owns a non sleepable lock
>> >> >> panic: sleeping thread
>> >> >> cpuid = 0
>> >> >>
>> >> >> PID 16 is always USB on my box.
>> >> >
>> >> > You really need to give us a backtrace when you quote panics. It is
>> >> > impossible to make any sense of the above panic message without more
>> >> > context.
>> >>
>> >> In the case of this panic, the stack of the thread which panics is
>> >> useless; it's someone trying to propagate priority that discovered it.
>> >>  A backtrace on tid 100033 would be useful.
>> >>
>> >> With WITNESS enabled, it's possible to have this panic display the
>> >> stack of the incorrectly sleeping thread at the time it acquired the
>> >> lock, as well, but this code isn't in CURRENT or any release.  I have
>> >> a patch at $WORK I can dig up on Monday.
>> >
>> > Huh?  The stock kernel dumps a stack trace of the offending thread if you have
>> > DDB enabled:
>> >
>> >                /*
>> >                 * If the thread is asleep, then we are probably about
>> >                 * to deadlock.  To make debugging this easier, just
>> >                 * panic and tell the user which thread misbehaved so
>> >                 * they can hopefully get a stack trace from the truly
>> >                 * misbehaving thread.
>> >                 */
>> >                if (TD_IS_SLEEPING(td)) {
>> >                        printf(
>> >                "Sleeping thread (tid %d, pid %d) owns a non-sleepable lock\n",
>> >                            td->td_tid, td->td_proc->p_pid);
>> > #ifdef DDB
>> >                        db_trace_thread(td, -1);
>> > #endif
>> >                        panic("sleeping thread");
>> >                }
>>
>> Hmm, maybe this wasn't in 7, or maybe I'm just remembering that we
>> added code to print *which* lock it holds (using WITNESS data).  I do
>> recall that this panic alone was often not sufficient to debug the
>> problem.
>
> I think the db_trace_thread() has been around for a while (since 5 or 6),
> but it is true that we don't tell you which lock is held even with this.
> That might be a useful thing to output before the panic.


This patch isn't quite right since I had to hand-edit it.  There's a
small chance I can commit this in the near future, but of someone else
wants to take it, feel free.  Style isn't yet fixed up to be FreeBSD
standard either.


--- /data/sb/bsd.git/sys/kern/subr_turnstile.c	2011-12-12
10:23:12.542196632 -0800
+++ kern/subr_turnstile.c	2011-12-09 10:59:29.882643558 -0800
_at__at_ -165,10 +165,43 _at__at_
 static void	turnstile_dtor(void *mem, int size, void *arg);
 #endif
 static int	turnstile_init(void *mem, int size, int flags);
 static void	turnstile_fini(void *mem, int size);

+#ifdef INVARIANTS
+static void
+sleeping_thread_owns_a_nonsleepable_lock(struct thread *td)
+{
+	printf("Sleeping thread (tid %d, pid %d) owns a non-sleepable lock\n",
+	    td->td_tid, td->td_proc->p_pid);
+#ifdef DDB
+	db_trace_thread(td, -1);
+#endif
+#ifdef WITNESS
+	struct lock_list_entry *lock_list, *lle;
+	int i;
+
+	lock_list = td->td_sleeplocks;
+	if (lock_list == NULL || lock_list->ll_count == 0) {
+		printf("Thread does not appear to hold any mutexes!\n");
+		return;
+	}
+
+	for (lle = lock_list; lle != NULL; lle = lle->ll_next) {
+		for (i = lle->ll_count - 1; i >= 0; i--) {
+			struct lock_instance *li = &lle->ll_children[i];
+
+			printf("Lock %s acquired at %s:%d\n",
+			    li->li_lock->lo_name, li->li_file, li->li_line);
+		}
+	}
+#endif /* WITNESS */
+}
+#else
+#define sleeping_thread_owns_a_nonsleepable_lock(td) do { } while (0)
+#endif /* INVARIANTS */
+
 /*
  * Walks the chain of turnstiles and their owners to propagate the priority
  * of the thread being blocked to all the threads holding locks that have to
  * release their locks before this thread can run again.
  */
_at__at_ -210,19 +243,31 _at__at_
 		 * If the thread is asleep, then we are probably about
 		 * to deadlock.  To make debugging this easier, just
 		 * panic and tell the user which thread misbehaved so
 		 * they can hopefully get a stack trace from the truly
 		 * misbehaving thread.
 		 */
 		if (TD_IS_SLEEPING(td)) {
-			printf(
-		"Sleeping thread (tid %d, pid %d) owns a non-sleepable lock\n",
-			    td->td_tid, td->td_proc->p_pid);
-#ifdef DDB
-			db_trace_thread(td, -1);
-#endif
-			panic("sleeping thread");
+			sleeping_thread_owns_a_nonsleepable_lock(td);
+			panic("sleeping thread %p owns a nonsleepable lock",
+			    td);
 		}

 		/*
 		 * If this thread already has higher priority than the
 		 * thread that is being blocked, we are finished.


Cheers,
matthew
Received on Tue Dec 20 2011 - 15:42:26 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:22 UTC