Re: deadlock in ata_queue_request()

From: Soren Schmidt <sos_at_spider.deepcore.dk>
Date: Thu, 25 Dec 2003 10:07:10 +0100 (CET)
It seems Bruce Evans wrote:
> ata_queue_request() sleeps in an interrupt handler here:

Yes, I have a local fix to help this, the sleep was originally left in to
make a backport to -stable easier (ie no mutexes), but this need to be
changed here. I'll get it committed asap, but it is hollidays and the
kids has alot of new toys :)

> % void
> % ata_queue_request(struct ata_request *request)
> % {
> %     /* mark request as virgin (it might be a reused one) */
> %     request->result = request->status = request->error = 0;
> %     request->flags &= ~ATA_R_DONE;
> %
> %     /* put request on the locked queue at the specified location */
> %     mtx_lock(&request->device->channel->queue_mtx);
> %     if (request->flags & ATA_R_AT_HEAD)
> % 	TAILQ_INSERT_HEAD(&request->device->channel->ata_queue, request, chain);
> %     else
> % 	TAILQ_INSERT_TAIL(&request->device->channel->ata_queue, request, chain);
> %     mtx_unlock(&request->device->channel->queue_mtx);
> %
> %     /* should we skip start ? */
> %     if (!(request->flags & ATA_R_SKIPSTART))
> % 	ata_start(request->device->channel);
> %
> %     /* if this was a requeue op callback/sleep already setup */
> %     if (request->flags & ATA_R_REQUEUE)
> % 	return;
> %
> %     /* if this is not a callback and we havn't seen DONE yet -> sleep */
> %     if (!request->callback) {
> % 	while (!(request->flags & ATA_R_DONE))
> % 	    tsleep(request, PRIBIO, "atareq", hz/10);
>   	    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> %     }
> % }
> 
> when it is called from an interrupt handler.  It is called from an interrupt
> handler as part of timeout processing:
> 
> ...
> msleep(...)
> ata_queue_request(...)
> ata_via_family_setmode(...)
> ata_identify_devices(...)
> ata_reinit(...)
> ata_timeout(...)
> softclock(...)
> ithread_loop(...)
> ...
> 
> The timeout was called here shortly after ad2 hung:
> 
> Dec 25 12:28:27 besplex kernel: ad2: TIMEOUT - READ_DMA retrying (2 retries left)
> Dec 25 12:28:27 besplex kernel: ata1: resetting devices ..
> Dec 25 12:28:27 besplex kernel: ad2: FAILURE - already active DMA on this device
> Dec 25 12:28:27 besplex kernel: ad2: setting up DMA failed
> 
> ATA_R_DONE was never set and wakeup_request() was never called either, so
> softclock() was deadlocked and tsleep() never returned.
> 
> The system ran surprisingly well with softclock() deadlocked.  ad0 worked
> and everything that didn't use timeouts worked.  Examples of things that
> didn't work because they use timeouts:
> - syscons screen updates.
> - statistics in top and systat.
> - sleep 1 in shells.
> - mbmon (shows the status, then never repeats).
> 
> I tried the following to recover:
> - call wakeup(request) using ddb.  This worked, but ATA_R_DONE was never
>   set so ata_queue_request() just looped.
> - also ignore the ATA_R_DONE check using ddb.  This un-deadlocked
>   softclock(), but ata1 remained wedged.
> - then call "atacontrol reinit 1".  This partly worked:
> 
> Dec 25 14:31:12 besplex kernel: ata1: resetting devices ..
> Dec 25 14:31:44 besplex kernel: ad2: WARNING - removed from configuration
> Dec 25 14:31:44 besplex kernel: done
> 
> but the ata driver caused a null pointer panic an instant later.
> 
> - ad2 didn't come back after a hard reset.
> - ad2 came back after a power cycle.
> 
> This was on an undermydesktop.  Problems resuming on laptops may be similar.
> The hardware may really be wedged.  Then the software shouldn't make things
> worse by sleeping or spinning in the timeout handler.
> 
> Bruce
> 

-Søren
                     Yes I know it works under windows!!
Received on Thu Dec 25 2003 - 00:07:49 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:35 UTC