Re: Re: 5.3-RELEASE: WARNING - WRITE_DMA interrupt timout - what does it mean?

From: Zoltan Frombach <tssajo_at_hotmail.com>
Date: Wed, 10 Nov 2004 01:53:17 -0800
I'm on FreeBSD 5.3-RELEASE. I get this when I try to patch the file:

# patch subr_taskqueue.c </usr/home/tss/kernelpatch/patch-subr_taskqueue.c
Hmm...  Looks like a unified diff to me...
The text leading up to this was:
--------------------------
|--- kern/subr_taskqueue.c 5 Oct 2004 04:16:00 -0000 1.25
|+++ kern/subr_taskqueue.c 23 Oct 2004 19:18:26 -0000
--------------------------
Patching file subr_taskqueue.c using Plan A...
Hunk #1 succeeded at 36 with fuzz 2.
Hunk #2 failed at 147.
Hunk #3 failed at 182.
Hunk #4 failed at 198.
3 out of 4 hunks failed--saving rejects to subr_taskqueue.c.rej
Hmm...  The next patch looks like a unified diff to me...
The text leading up to this was:
--------------------------
|Index: sys/_task.h
|===================================================================
|RCS file: /home/ncvs/src/sys/sys/_task.h,v
|retrieving revision 1.3
|diff -u -r1.3 _task.h
|--- sys/_task.h 5 Oct 2004 04:16:01 -0000 1.3
|+++ sys/_task.h 23 Oct 2004 19:38:16 -0000
--------------------------
File to patch:


Is your patch were produce for HEAD? Or what is wrong?

Zoltan

On Wed, 10 Nov 2004, Søren Schmidt wrote:

> Zoltan Frombach wrote:
> > I just upgraded to 5.3-RELEASE a few days ago. This morning this line
> > got into my system log file:
> > Nov  9 06:14:03 www kernel: ad0: WARNING - WRITE_DMA interrupt was seen
> > but timeout fired LBA=2491143
> >
> > I've never seen this message before. Can someone please explain what it
> > means? With Thanks,
>
> It means that the disk has processed the write request (interrupt seen),
> but that the system (the bio_taskqueue) hasn't been able to get the
> result returned to the kernel.
>
> Your disk is not involved in this problem since it has done its part,
> but the rest of the system is either busy with something else, or there
> are bugs lurking that prohibits the bio_taskqueue from running.
>
> Either way its a WARNING not a FAILURE :)

I'm still a bit skeptical that the task queue is at fault -- I run my
notebook with continuous measurement of the latency to schedule tasks,
generating a warning for any latency > .5 seconds, and the only time I
ever see that sort of latency is during the boot process when ACPI has
scheduled a task to run, but the task queue thread has not yet been
allowed to run:

ipfw2 initialized, divert loadable, rule-based forwarding disabled,
default to deny, logging disabled
taskqueue_run: warning, queue time of 0.900011314 for context 0xc045638c
ad0: 19077MB <FUJITSU MHS2020AT E/8307> [38760/16/63] at ata0-master
UDMA33

It would be quite interesting to have someone run with this timing code on
a system reporting the warning to see what they find.  What is the
threshold to generate the ATA timeout warning?  My impression was that it
was several seconds, which is quite a long time.  The attached patch
checks for .5 second or above latency.  If the task queue is really
getting stalled for several seconds, we should figure out what task it is.
There's KTR tracing in the taskqueue code that we can use to dump a trace
stream to see what task is taking so long.

Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
robert_at_fledge.watson.org      Principal Research Scientist, McAfee Research

Index: kern/subr_taskqueue.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/subr_taskqueue.c,v
retrieving revision 1.25
diff -u -r1.25 subr_taskqueue.c
--- kern/subr_taskqueue.c 5 Oct 2004 04:16:00 -0000 1.25
+++ kern/subr_taskqueue.c 23 Oct 2004 19:18:26 -0000
_at__at_ -36,9 +36,16 _at__at_
 #include <sys/lock.h>
 #include <sys/malloc.h>
 #include <sys/mutex.h>
+#include <sys/sysctl.h>
 #include <sys/taskqueue.h>
+#include <sys/time.h>
 #include <sys/unistd.h>

+int tq_in;
+SYSCTL_INT(_kern, OID_AUTO, tq_in, CTLFLAG_RD, &tq_in, 0, "");
+int tq_out;
+SYSCTL_INT(_kern, OID_AUTO, tq_out, CTLFLAG_RD, &tq_out, 0, "");
+
 static MALLOC_DEFINE(M_TASKQUEUE, "taskqueue", "Task Queues");
 static void *taskqueue_giant_ih;
 static void *taskqueue_ih;
_at__at_ -140,6 +147,9 _at__at_
  return 0;
  }

+ getnanotime(&task->ta_queuetime);
+ tq_in++;
+
  /*
  * Optimise the case when all tasks have the same priority.
  */
_at__at_ -172,6 +182,7 _at__at_
 taskqueue_run(struct taskqueue *queue)
 {
  struct task *task;
+ struct timespec tv;
  int owned, pending;

  owned = mtx_owned(&queue->tq_mutex);
_at__at_ -187,8 +198,16 _at__at_
  pending = task->ta_pending;
  task->ta_pending = 0;
  task->ta_flags |= TAF_PENDING;
+ tq_out++;
  mtx_unlock(&queue->tq_mutex);

+ getnanotime(&tv);
+ timespecsub(&tv, &task->ta_queuetime);
+ if (tv.tv_nsec >= 0500000000) {
+ printf("taskqueue_run: warning, queue time of %d.%09ld "
+     "for context %p\n", tv.tv_sec, tv.tv_nsec,
+     task->ta_func);
+ }
  task->ta_func(task->ta_context, pending);

  mtx_lock(&queue->tq_mutex);
Index: sys/_task.h
===================================================================
RCS file: /home/ncvs/src/sys/sys/_task.h,v
retrieving revision 1.3
diff -u -r1.3 _task.h
--- sys/_task.h 5 Oct 2004 04:16:01 -0000 1.3
+++ sys/_task.h 23 Oct 2004 19:38:16 -0000
_at__at_ -46,6 +46,7 _at__at_
  task_fn_t *ta_func; /* task handler */
  void *ta_context; /* argument for handler */
  int ta_flags; /* Flags */
+ struct timespec ta_queuetime; /* time enqueued */
 };

 #define TAF_PENDING 0x1 /* Task is being run now */ 
Received on Wed Nov 10 2004 - 08:54:00 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:21 UTC