On Sun, 7 Nov 2004 freebsd_at_newmillennium.net.au wrote: > In geom_vinum_plex.c, line 575 > > /* > * RAID5 sub-requests need to come in correct order, otherwise > * we trip over the parity, as it might be overwritten by > * another sub-request. > */ > if (pbp->bio_driver1 != NULL && > gv_stripe_active(p, pbp)) { > /* Park the bio on the waiting queue. */ > pbp->bio_cflags |= GV_BIO_ONHOLD; > bq = g_malloc(sizeof(*bq), M_WAITOK | M_ZERO); > bq->bp = pbp; > mtx_lock(&p->bqueue_mtx); > TAILQ_INSERT_TAIL(&p->wqueue, bq, queue); > mtx_unlock(&p->bqueue_mtx); > } > > It seems we are holding back all requests to a currently active stripe, > even if it is just a read and would never write anything back. No, only writes are held back. pbp->bio_driver1 is NULL when it's a normal read. > 1. To calculate parity, we could simply read the old data (that was > about to be overwritten), and the old parity, and recalculate the parity > based on that information, rather than reading in all the stripes (based > on the assumption that the original parity was correct). This would > still take approximately the same amount of time, but would leave the > other disks in the stripe available for other I/O. That's how it's already done: old parity, old data is read. New parity, new data is written. > 2. If there are two or more writes pending for the same stripe (that is, > up to the point that the data|parity has been written), they should be > condensed into a single operation so that there is a single write to the > parity, rather than one write for each operation. This way, we should be > able to get close to (N -1) * disk throughput for large sequential > writes, without compromising the integrity of the parity on disk. > > 3. When calculating parity as per (2), we should operate on whole blocks > (as defined by the underlying device). This provides the benefit of > being able to write a complete block to the subdisk, so the underlying > mechanism does not have to do a read/update/write operation to write a > partial block. These are interesting ideas and I'm gonna think about it. thanks, le -- Lukas Ertl http://homepage.univie.ac.at/l.ertl/ le_at_FreeBSD.org http://people.freebsd.org/~le/Received on Sun Nov 07 2004 - 09:41:04 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:21 UTC