Re: ad0: TIMEOUT - READ_DMA retrying (2 retries left) LBA=207594611

From: Kevin Oberman <oberman_at_es.net>
Date: Thu, 16 Sep 2004 11:36:41 -0700
> Date: Wed, 15 Sep 2004 15:05:34 -0600
> From: Scott Long <scottl_at_samsco.org>
> Sender: owner-freebsd-current_at_freebsd.org
> 
> Søren Schmidt wrote:
> > Mike Jakubik wrote:
> > 
> >> Søren Schmidt said:
> >>
> >>
> >>> You are having massive ICRC problems which are different and most likely
> >>> due to bad cables/connectors or cables that are turned around (blue
> >>> connector at controller, black/grey at devices), or it can be a
> >>> weak/overloaded PSU.
> >>>
> >> This is a different error message from what everyone else, including 
> >> me is
> >> reporting. What about the errors we are getting?
> > 
> > 
> > I have no idea, I can't reproduce the problem at all. However I suspect 
> > somthing else is blocking interrupt delivery but its just a hunch...
> > 
> > -Søren
> > 
> 
> I'm finding it hard to imagine a scenario where a timeout could fire but 
> not a hardware interrupt.  Nothing usually shares the interrupt vectors
> with ATA, so it's pretty unlikely that the ata ithread is being blocked
> by anything but itself.

This sounds reasonable, but I can make the problem start/stop by
starting/stopping the network card. No problems in single-user. Then I
'ifconfig xl0 192.116.1.1' and immediately start getting the errors. I
also get watchdog timeouts on xl0. 'ifconfig xl0 down' stops the errors.
xl0 is on IRQ10, ata1 is on IRQ15. I have a K6 processor in an ASUS P5A
with neither SMP or APIC. (I am running ACPI, not that there is much to
it on this system.)

While I don't entirely discount the possibility that this is in ata, it
seems odd that I get no errors even doing a buildworld as long as the
network is off. 

This started pretty recently, but changes have been made in the period
of suspicion to the scheduler, ACPI, and ata, so it's still fuzzy. My
system gets the errors consistently enough that I will try to narrow
down what patch caused the problem. (Wish it was a bit faster to build
kernels, though!) I have a feeling in the pit of my stomach that it's
going to show up at with a scheduler patch MT5, but I hope I'm wrong! I
think I'd prefer an ATA problem to a scheduler issue. (Of course, Søren
probably has a differing opinion on this.)
-- 
R. Kevin Oberman, Network Engineer
Energy Sciences Network (ESnet)
Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab)
E-mail: oberman_at_es.net			Phone: +1 510 486-8634
Received on Thu Sep 16 2004 - 16:36:42 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:12 UTC