Re: Order of files with 'cp'

From: Garance A Drosihn <drosih_at_rpi.edu>
Date: Sun, 20 Nov 2005 21:56:58 -0500
At 7:29 PM +0000 11/20/05, Brian Candler wrote:
>On Sat, Nov 19, 2005 at 11:33:54AM -0800, Tim Kientzle wrote:
>>  Brian Candler wrote:
>  > > I've noticed on FreeBSD-5.4 and -6.0 that the order in which
>  > > 'cp' copies multiple files does not match the order they're
>  > > given on the command line.
>  > ...
>  > > I've had a look through the code, and it seems that cp calls
>  > > fts_open() with the list of files in argv; fts_open then does
>  > > a qsort() on the arguments, using the comparison function
>  > > mastercmp() provided by cp:
>  >
>>  My suggestion:  Have 'cp' call fts_open once for each
>>  command-line argument, instead of giving fts_open the entire
>>  argv list to muck with.
>
>Erm, but that just undoes the reason for calling fts_open with
>mastercmp in the first place, which is to get it to pick files
>before directories (or vice versa, as its behaviour seems to
>be) as an 'optimisation'.

If I understand the situation right, the suggestion would not
completely undo the optimization that 'cp' is trying to do.
Consider the command:
     cp -rp file1 dir1 file2 dir2 destdir

The suggestion would mean the files going into destdir itself
would not be sorted, but (if I understand this thread) files
copied into destdir/dir1 and destdir/dir2 would still be sorted.

Apparently this "sorting optimization" in `cp' goes all the way
back to the original version of `cp' from 1994.  While I expect
we should change it to something better, I don't think we have
any urgent reason to fix it immediately.  Which is to say, let's
figure out what the issues are, and come up with the best fix
instead of the "easiest change" which we can rush to implement.

*Assuming* the comment is correct, and that there *is* some
performance benefit by copying files before directories, then
it still seems to me that sorting all the files is a pretty
clumsy heavy-handed way to accomplish that.  These days some
people have directories with tens of thousands of entries in
them.  Do we really want the overhead of "sorting" all of those
entries just so files are copied before directories?

I think a better fix might be to add an option to fts_open() which
tells it to "process files before directories" (or visa-versa) in
any given directory.  Then `cp' could turn on that bit, and avoid
the fake sort.

It seems to me that if fts_open realizes that is wanted, then
it could implement that behavior in some manner which is faster
than sorting all entries.

-- 
Garance Alistair Drosehn            =   gad_at_gilead.netel.rpi.edu
Senior Systems Programmer           or  gad_at_freebsd.org
Rensselaer Polytechnic Institute    or  drosih_at_rpi.edu
Received on Mon Nov 21 2005 - 01:57:03 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:48 UTC