Re: Re: 5.3-RELEASE: WARNING - WRITE_DMA interrupt timout - what does it mean?

From: Zoltan Frombach <tssajo_at_hotmail.com>
Date: Wed, 10 Nov 2004 01:40:26 -0800
Just let me know what to do and I'll do it. I will recompile my kernel with 
the patch you've sent. And I will let you know what the result is.

In the meantime, here is my dmesg information:

# dmesg
ad0: WARNING - WRITE_DMA interrupt was seen but timeout fired LBA=2491143
ad0: WARNING - WRITE_DMA interrupt was seen but timeout fired LBA=2481983
Waiting (max 60 seconds) for system process `vnlru' to stop...done
Waiting (max 60 seconds) for system process `bufdaemon' to stop...done
Waiting (max 60 seconds) for system process `syncer' to stop...
Syncing disks, vnodes remaining...12 2 2 0 0 done
No buffers busy after final sync
Uptime: 2d0h40m18s
Waiting (max 60 seconds) for system process `hpt_wt' to stop...done
Copyright (c) 1992-2004 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD 5.3-RELEASE #1: Sat Nov  6 06:03:26 PST 2004
    tss_at_www.frombach.com:/usr/obj/usr/src/sys/GENERIC
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Pentium(R) 4 CPU 2.80GHz (2806.38-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0xf29  Stepping = 9
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Hyperthreading: 2 logical CPUs
real memory  = 1056899072 (1007 MB)
avail memory = 1024679936 (977 MB)
ACPI APIC Table: <AWARD  AWRDACPI>
ioapic0: Changing APIC ID to 2
ioapic0 <Version 1.4> irqs 0-23 on motherboard
npx0: [FAST]
npx0: <math processor> on motherboard
npx0: INT 16 interface
acpi0: <AWARD AWRDACPI> on motherboard
acpi0: Power Button (fixed)
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0
cpu0: <ACPI CPU (3 Cx states)> on acpi0
acpi_tz0: <Thermal Zone> on acpi0
acpi_button0: <Power Button> on acpi0
acpi_button1: <Sleep Button> on acpi0
pcib0: <ACPI Host-PCI bridge> port 
0x10e0-0x10ff,0x1000-0x10df,0x480-0x48f,0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
agp0: <SiS 661 host to AGP bridge> mem 0xd0000000-0xd7ffffff at device 0.0 
on pci0
pcib1: <PCI-PCI bridge> at device 1.0 on pci0
pci1: <PCI bus> on pcib1
pci1: <display, VGA> at device 0.0 (no driver attached)
isab0: <PCI-ISA bridge> at device 2.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <SiS 964 UDMA133 controller> port 
0x4000-0x400f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 2.5 on pci0
ata0: channel #0 on atapci0
ata1: channel #1 on atapci0
ohci0: <SiS 5571 USB controller> mem 0xe1104000-0xe1104fff irq 20 at device 
3.0 on pci0
ohci0: [GIANT-LOCKED]
usb0: OHCI version 1.0, legacy support
usb0: <SiS 5571 USB controller> on ohci0
usb0: USB revision 1.0
uhub0: SiS OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 3 ports with 3 removable, self powered
ohci1: <SiS 5571 USB controller> mem 0xe1100000-0xe1100fff irq 21 at device 
3.1 on pci0
ohci1: [GIANT-LOCKED]
usb1: OHCI version 1.0, legacy support
usb1: <SiS 5571 USB controller> on ohci1
usb1: USB revision 1.0
uhub1: SiS OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 3 ports with 3 removable, self powered
ohci2: <SiS 5571 USB controller> mem 0xe1101000-0xe1101fff irq 22 at device 
3.2 on pci0
ohci2: [GIANT-LOCKED]
usb2: OHCI version 1.0, legacy support
usb2: <SiS 5571 USB controller> on ohci2
usb2: USB revision 1.0
uhub2: SiS OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub2: 2 ports with 2 removable, self powered
pci0: <serial bus, USB> at device 3.3 (no driver attached)
xl0: <3Com 3c905B-TX Fast Etherlink XL> port 0xe000-0xe07f mem 
0xe1103000-0xe110307f irq 17 at device 9.0 on pci0
miibus0: <MII bus> on xl0
bmtphy0: <3c905B 10/100 internal PHY> on miibus0
bmtphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
xl0: Ethernet address: 00:50:04:76:49:e7
fdc0: <floppy drive controller> port 0x3f7,0x3f0-0x3f5 irq 6 drq 2 on acpi0
fdc0: [FAST]
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on 
acpi0
sio0: type 16550A
orm0: <ISA Option ROM> at iomem 0xc0000-0xcbfff on isa0
pmtimer0 on isa0
atkbdc0: <Keyboard controller (i8042)> at port 0x64,0x60 on isa0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
ppc0: parallel port not found.
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
sio1: configured irq 3 not in bitmap of probed irqs 0
sio1: port may not be enabled
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
Timecounter "TSC" frequency 2806375780 Hz quality 800
Timecounters tick every 10.000 msec
ad0: 78167MB <Maxtor 6Y080L0/YAR41VW0> [158816/16/63] at ata0-master UDMA133
acd0: CDROM <CDU5211/YYS7> at ata1-master UDMA33
Mounting root from ufs:/dev/ad0s1a
ipfw2 initialized, divert disabled, rule-based forwarding disabled, default 
to deny, logging disabled


And uname -a :

# uname -a
FreeBSD www.xxxxxxxx.com 5.3-RELEASE FreeBSD 5.3-RELEASE #1: Sat Nov  6 
06:03:26 PST 2004     xxx_at_www.xxxxxxxx.com:/usr/obj/usr/src/sys/GENERIC 
i386

I run the 5.3-RELEASE GENERIC kernel (UP mode).

Zoltan

Subject: Re: Re: 5.3-RELEASE: WARNING - WRITE_DMA interrupt timout - what 
does it mean?


On Wed, 10 Nov 2004, Søren Schmidt wrote:

> Zoltan Frombach wrote:
> > I just upgraded to 5.3-RELEASE a few days ago. This morning this line
> > got into my system log file:
> > Nov  9 06:14:03 www kernel: ad0: WARNING - WRITE_DMA interrupt was seen
> > but timeout fired LBA=2491143
> >
> > I've never seen this message before. Can someone please explain what it
> > means? With Thanks,
>
> It means that the disk has processed the write request (interrupt seen),
> but that the system (the bio_taskqueue) hasn't been able to get the
> result returned to the kernel.
>
> Your disk is not involved in this problem since it has done its part,
> but the rest of the system is either busy with something else, or there
> are bugs lurking that prohibits the bio_taskqueue from running.
>
> Either way its a WARNING not a FAILURE :)

I'm still a bit skeptical that the task queue is at fault -- I run my
notebook with continuous measurement of the latency to schedule tasks,
generating a warning for any latency > .5 seconds, and the only time I
ever see that sort of latency is during the boot process when ACPI has
scheduled a task to run, but the task queue thread has not yet been
allowed to run:

ipfw2 initialized, divert loadable, rule-based forwarding disabled,
default to deny, logging disabled
taskqueue_run: warning, queue time of 0.900011314 for context 0xc045638c
ad0: 19077MB <FUJITSU MHS2020AT E/8307> [38760/16/63] at ata0-master
UDMA33

It would be quite interesting to have someone run with this timing code on
a system reporting the warning to see what they find.  What is the
threshold to generate the ATA timeout warning?  My impression was that it
was several seconds, which is quite a long time.  The attached patch
checks for .5 second or above latency.  If the task queue is really
getting stalled for several seconds, we should figure out what task it is.
There's KTR tracing in the taskqueue code that we can use to dump a trace
stream to see what task is taking so long.

Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
robert_at_fledge.watson.org      Principal Research Scientist, McAfee Research

Index: kern/subr_taskqueue.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/subr_taskqueue.c,v
retrieving revision 1.25
diff -u -r1.25 subr_taskqueue.c
--- kern/subr_taskqueue.c 5 Oct 2004 04:16:00 -0000 1.25
+++ kern/subr_taskqueue.c 23 Oct 2004 19:18:26 -0000
_at__at_ -36,9 +36,16 _at__at_
 #include <sys/lock.h>
 #include <sys/malloc.h>
 #include <sys/mutex.h>
+#include <sys/sysctl.h>
 #include <sys/taskqueue.h>
+#include <sys/time.h>
 #include <sys/unistd.h>

+int tq_in;
+SYSCTL_INT(_kern, OID_AUTO, tq_in, CTLFLAG_RD, &tq_in, 0, "");
+int tq_out;
+SYSCTL_INT(_kern, OID_AUTO, tq_out, CTLFLAG_RD, &tq_out, 0, "");
+
 static MALLOC_DEFINE(M_TASKQUEUE, "taskqueue", "Task Queues");
 static void *taskqueue_giant_ih;
 static void *taskqueue_ih;
_at__at_ -140,6 +147,9 _at__at_
  return 0;
  }

+ getnanotime(&task->ta_queuetime);
+ tq_in++;
+
  /*
  * Optimise the case when all tasks have the same priority.
  */
_at__at_ -172,6 +182,7 _at__at_
 taskqueue_run(struct taskqueue *queue)
 {
  struct task *task;
+ struct timespec tv;
  int owned, pending;

  owned = mtx_owned(&queue->tq_mutex);
_at__at_ -187,8 +198,16 _at__at_
  pending = task->ta_pending;
  task->ta_pending = 0;
  task->ta_flags |= TAF_PENDING;
+ tq_out++;
  mtx_unlock(&queue->tq_mutex);

+ getnanotime(&tv);
+ timespecsub(&tv, &task->ta_queuetime);
+ if (tv.tv_nsec >= 0500000000) {
+ printf("taskqueue_run: warning, queue time of %d.%09ld "
+     "for context %p\n", tv.tv_sec, tv.tv_nsec,
+     task->ta_func);
+ }
  task->ta_func(task->ta_context, pending);

  mtx_lock(&queue->tq_mutex);
Index: sys/_task.h
===================================================================
RCS file: /home/ncvs/src/sys/sys/_task.h,v
retrieving revision 1.3
diff -u -r1.3 _task.h
--- sys/_task.h 5 Oct 2004 04:16:01 -0000 1.3
+++ sys/_task.h 23 Oct 2004 19:38:16 -0000
_at__at_ -46,6 +46,7 _at__at_
  task_fn_t *ta_func; /* task handler */
  void *ta_context; /* argument for handler */
  int ta_flags; /* Flags */
+ struct timespec ta_queuetime; /* time enqueued */
 };

 #define TAF_PENDING 0x1 /* Task is being run now */ 
Received on Wed Nov 10 2004 - 08:41:02 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:21 UTC