ata_queue_request() sleeps in an interrupt handler here: % void % ata_queue_request(struct ata_request *request) % { % /* mark request as virgin (it might be a reused one) */ % request->result = request->status = request->error = 0; % request->flags &= ~ATA_R_DONE; % % /* put request on the locked queue at the specified location */ % mtx_lock(&request->device->channel->queue_mtx); % if (request->flags & ATA_R_AT_HEAD) % TAILQ_INSERT_HEAD(&request->device->channel->ata_queue, request, chain); % else % TAILQ_INSERT_TAIL(&request->device->channel->ata_queue, request, chain); % mtx_unlock(&request->device->channel->queue_mtx); % % /* should we skip start ? */ % if (!(request->flags & ATA_R_SKIPSTART)) % ata_start(request->device->channel); % % /* if this was a requeue op callback/sleep already setup */ % if (request->flags & ATA_R_REQUEUE) % return; % % /* if this is not a callback and we havn't seen DONE yet -> sleep */ % if (!request->callback) { % while (!(request->flags & ATA_R_DONE)) % tsleep(request, PRIBIO, "atareq", hz/10); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ % } % } when it is called from an interrupt handler. It is called from an interrupt handler as part of timeout processing: ... msleep(...) ata_queue_request(...) ata_via_family_setmode(...) ata_identify_devices(...) ata_reinit(...) ata_timeout(...) softclock(...) ithread_loop(...) ... The timeout was called here shortly after ad2 hung: Dec 25 12:28:27 besplex kernel: ad2: TIMEOUT - READ_DMA retrying (2 retries left) Dec 25 12:28:27 besplex kernel: ata1: resetting devices .. Dec 25 12:28:27 besplex kernel: ad2: FAILURE - already active DMA on this device Dec 25 12:28:27 besplex kernel: ad2: setting up DMA failed ATA_R_DONE was never set and wakeup_request() was never called either, so softclock() was deadlocked and tsleep() never returned. The system ran surprisingly well with softclock() deadlocked. ad0 worked and everything that didn't use timeouts worked. Examples of things that didn't work because they use timeouts: - syscons screen updates. - statistics in top and systat. - sleep 1 in shells. - mbmon (shows the status, then never repeats). I tried the following to recover: - call wakeup(request) using ddb. This worked, but ATA_R_DONE was never set so ata_queue_request() just looped. - also ignore the ATA_R_DONE check using ddb. This un-deadlocked softclock(), but ata1 remained wedged. - then call "atacontrol reinit 1". This partly worked: Dec 25 14:31:12 besplex kernel: ata1: resetting devices .. Dec 25 14:31:44 besplex kernel: ad2: WARNING - removed from configuration Dec 25 14:31:44 besplex kernel: done but the ata driver caused a null pointer panic an instant later. - ad2 didn't come back after a hard reset. - ad2 came back after a power cycle. This was on an undermydesktop. Problems resuming on laptops may be similar. The hardware may really be wedged. Then the software shouldn't make things worse by sleeping or spinning in the timeout handler. BruceReceived on Wed Dec 24 2003 - 19:38:24 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:35 UTC