Alfred Perlstein wrote: > Have you guys thought of using aio or at least another process > to parallelize IO? So far, experiments using separate processes have not been encouraging. Asynchronous I/O, mmap, or threads are all possibilities that haven't been tried yet. Alexey Dokuchaev suggested: > ... non-blocking/async IO would be faster ... Alfred Perlstein wrote: > Threads are pretty portable these days, ... I've considered all of the above, but haven't had time to actually implement them. Ultimately, it will require implementing and testing each one to see which approach works best. If someone has time to give it a try, the coding should be pretty simple. Here's an outline of what to do: * The read/extract side is much easier. Start there. ;-) * usr.bin/tar/read.c currently calls archive_read_open_file to open the file. * archive_read_open_file is just a fairly thin wrapper around archive_read_open. The basic strategy, then, is to use archive_read_open directly, providing your own open/read/close callback functions instead of using the simple canned versions that archive_read_open_file provides. So, start by copying libarchive/archive_read_open_file.c into usr.bin/tar/read.c. Rename things and make them static to avoid clashes with the functions in the library, of course. Now, try alternatives to open/read/close. Each call to the read callback has to return a pointer and size of a "block." Note that there are no restrictions on the size of that block. Among other things, you could try: * Setting up a list of block buffers and using async I/O or a separate thread to pre-fill them. * Play with block sizes * Use mmap() to return the entire file as one single block. The hard is doing all the testing. You need to test performance under a variety of different circumstances: * Reading an archive from a regular file on the same disk that you're extracting to. * Reading an archive from a regular file on a different disk. * Stdin * Reading from tape/floppy/other device. * Using no compression/gzip/bzip2 compression. Ultimately, we may need different handling for devices (many of which require using read(2) with fixed block sizes for proper operation), regular files (where many different strategies could be tried), and maybe even stdin. TimReceived on Sat Apr 24 2004 - 11:49:09 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:52 UTC