Re: ad0: WARNING - WRITE_DMA interrupt was seen but timeout fired LBA=21267353

From: Scott Long <scottl_at_freebsd.org>
Date: Wed, 12 May 2004 09:43:01 -0600
Søren Schmidt wrote:
> Bjoern A. Zeeb wrote:
> 
>>>>> The WRITE operation was signalled done by the disk by issueing an
>>>>> interrupt and the finished request was put on a taskqueue to have it
>>>>> return status to the system. However the timeout code fired because 
>>>>> the
>>>>> taskqueue hadn't been executed yet (it will wait one ome timeout 
>>>>> period
>>>>> before the result is forced through)..
>>>>>
>>>>> So there is no harm done, but the taskqueue was slow to respond...
>>>>
>>>>
>>>>
>>>> as postend to freebsd-amd64 I am getting far too many of those and
>>>> they won't stop.
>>>>
>>>> From time to time there is also
>>>> ad10: WARNING - WRITE_DMA interrupt was seen but taskqueue stalled 
>>>> LBA=...
>>>> between all the ${subject} lines.
>>>
>>>
>>> Hmm, something is keeping the taskqueue busy, and its not ATA...
>>
>>  
>> but the thing starts with ${subject} and only after some dozen logs
>> (multi monitor pages) I get one or two taskqueue stalled; afterwards
>> ${subject} keeps scrolling again and everything starts from the
>> beginning.
> 
> 
> It always start like that, the stalled warning is the final operation 
> after 3 timeouts.
> 
>> Further more I can only reproduce it with heavy IO HDD traffic; it
>> doesn't happen for network trafic, it doesn't happen when only working
>> on a 512MB md0 memory disk I think. It happens once I copy sources to
>> HDD, have src or obj on HDD and compiling world.
>>
>> The strange thing that made makes me thinking is that I also get it
>> when using atacontrol and set the channel to PIO0 BIOSPIO;
> 
> 
> The transfer mode has no meaning here, the ATA HW and driver has done 
> its bits when this fails, but it cannot get results returned back to the 
> system via the taskqueue subsystem. When the "stalled" warning shows up 
> ATA forces the result back circumventing the taskqueues.
> 
>> is there any way how I could get to know what's keeping the taskqueue
>> busy ? if scrolling starts no user interaction is possible anymore.
> 
> 
> Instrumenting the queue code ?
> 

Taskqueues work just fine for other drivers.  I can't see anything
obviously wrong with your use of the taskqueue (other than what we have
talked about before), though you also use bio_taskqueue().  Looking in
there, it has two additional mutexes that get locked when run.  Maybe
there is an LOR or other such problem that is causing the stall.
Running with WITNESS might reveal something.  You  might also just plain
be missing the interrupt from the drive,, but that is harder to
determine.


Scott
Received on Wed May 12 2004 - 06:43:26 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:53 UTC