memory corruption/panic solved ("FAILURE - ATAPI_IDENTIFY no interrupt")

From: Nate Lawson <nate_at_root.org>
Date: Fri, 30 Jul 2004 15:48:52 -0700
I've tracked down the source of the memory corruption in -current that
results when booting with various CD and DVD drives (especially the ones
that come with Thinkpads including T23, R32, T41, etc.)  The panic is
obvious when running with INVARIANTS ("memory modified after free") but
not so obvious in other configurations.  For instance, without
INVARIANTS, part of the rt_info structure is corrupted on my wireless
card, resulting in a panic during ifconfig on boot.  This is likely the
source of other problems, including phk's ACPI panic (again, only
triggered when booting with the CD drive in the bay.)

The root problem is that ata_timeout() fires and calls ata_pio_read()
which overwrites 512 bytes random memory.  There are actually two bugs 
here that overwrite memory.  The code path is as follows:

1. ata runs an IDENTIFY command on each drive.  It reaches this stack:

ata_getparam()
ata_identify_devices()
ata_boot_attach()

2. ata_getparam() allocates a request and runs it:
ata_alloc_request()
loop on retries (2 max)
fill out an immediate read request for 512 bytes (DEV_BSIZE)
*** Bug 1: transfersize is 512 bytes but sizeof(struct ata_request) is 
much less (~80 bytes).
ata_queue_request() starts the request and arms a timeout

3. Completion for first request results in FAILURE - no interrupt. 
ata_completed() calls ata_generic_interrupt() which calls ata_pio_read()
*** Bug 1: here's where the 512 - sizeof(struct ata_request) bytes 
following "request" are overwritten.

. The completion code also updates request->donecount (an 
upward-counting residual) from 0 to 512 after the request has been 
completed.

4. ata_getparam runs through the loop again and starts another identify
*** Bug 2: donecount is now 512 so ata_pio_read() will write on 
request->data + 512 with 512 bytes.  Whatever follows request->data + 
512 is overwritten.

I've tested the attached patch and it fixes the memory overwrite (tested 
with hardware watchpoints) and obviously, the panics.  It resets 
request->donecount in each loop iteration (bug 2).  It also mallocs 
DEV_BSIZE bytes for the transfer (bug 1).  An alternate fix for bug 1 is 
to lower the transfer size to sizeof(struct ata_request) but there may 
be some minimum transfer length requirements that force you to use 
DEV_BSIZE.

Commit approval requested.

-Nate



Index: ata-all.c
===================================================================
RCS file: /home/ncvs/src/sys/dev/ata/ata-all.c,v
retrieving revision 1.215
diff -u -r1.215 ata-all.c
--- src/sys/dev/ata/ata-all.c	12 Jul 2004 10:50:49 -0000	1.215
+++ src/sys/dev/ata/ata-all.c	30 Jul 2004 22:30:55 -0000
_at__at_ -549,7 +549,7 _at__at_
     if (!atadev->param)
 	atadev->param = malloc(sizeof(struct ata_params), M_ATA, M_NOWAIT);
     if (atadev->param) {
-	request = ata_alloc_request();
+	request = malloc(DEV_BSIZE, M_ATA, M_NOWAIT | M_ZERO);
 	if (request) {
 	    int retries = 2;
 	    while (retries-- > 0) {
_at__at_ -561,11 +561,12 _at__at_
 		request->data = (caddr_t)atadev->param;
 		request->bytecount = sizeof(struct ata_params);
 		request->transfersize = DEV_BSIZE;
+		request->donecount = 0;
 		ata_queue_request(request);
 		if (!(error = request->result))
 		    break;
 	    }
-	    ata_free_request(request);
+	    free(request, M_ATA);
 	}
 	if (!error && (isprint(atadev->param->model[0]) ||
 		       isprint(atadev->param->model[1]))) {
Received on Fri Jul 30 2004 - 20:49:27 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:04 UTC