Re: fts improvements, alternatives

From: Tim Kientzle <kientzle_at_freebsd.org>
Date: Thu, 13 Jan 2005 16:48:51 -0800
[Moved to -current for further discussion.]

David Schultz wrote:
>Tim Kientzle wrote:
>>Pawel Jakub Dawidek wrote:
>>
>>> Introduce new field 'fts_bignum' ...
>>> This work is part of the BigDisk project:
>>>         http://www.FreeBSD.org/projects/bigdisk/
>>
>>Any plans to deal with other fts limits ... ?
> 
> Removing FTS_LOGICAL's path length limitation is nontrivial, but
> given your experience with bsdtar, I'd say you're an ideal person
> to do it.  ;-)

As it happens, I started tinkering with these ideas a
while ago but haven't found time to finish it all.

Here's a snapshot of the current WIP:

http://people.freebsd.org/~kientzle/libarchive/src/tree.tgz

This includes a simple main() test function that
just does the rough equivalent of "find .".

> .. fts() effectively uses chdir("..") to
> climb up one level in the tree in chdir mode.  If it chdir'd
> across a symlink earlier, this would put it in the wrong place.
> Perhaps you have a better solution, but my best idea is to keep
> the parent directory open when chdiring ...

The tree package does exactly this.  It just keeps a
flag with each ancestor node indicating the type of traversal
that's needed.

> A more uniform approach would be to ... always keep the
> parent open when descending a level.  Unfortunately, this limits
> the traversal depth to the number of available file descriptors.

The tree package does not do this for exactly this reason.
I have some ideas for handling the case where the number of symlinks
exceeds the available file descriptors, but that doesn't seem
particularly pressing at the moment.  <grin>

> (On the other hand, I would argue that anyone with a directory
> tree a few thousand levels deep is asking for trouble.)

I thought you *wanted* to support big disks! ;-)

I started this work partly because I wanted to really stress
the long pathname support in bsdtar.  I've archived
directory trees with megabyte pathnames (several thousand
directory levels crossing several hundred symlinks) in testing.
Of course, I can't yet *dearchive* such deep trees.
That seems to be a harder problem.

The tree package also keeps a *lot* less data in memory than fts.
It has no trouble with million-entry directories, for example.
In comparison, the current ls crashes on such large directories
in part because of the memory required for fts.

The tree package is quite a bit different in many respects:
  * The traversal is in a very different order, for instance.
  * It has a completely different API than fts.
    It's fully opaque (so should be easier to change in the future,
    unlike fts).
  * It takes a very different approach to determine
    when to visit a child.  In particular, instead of
    the client specifying a mode and optionally inhibiting the descent
    through a "prune" request, the tree package has the client
    specifically request descent.  If you request descent for
    *every* item, you'll end up with a logical traversal; if you
    request descent for every dir, you'll end up with a physical
    traversal.  (The tree package is smart enough to ignore any
    descent request that isn't for a dir or a link to a dir.)

Feedback, suggestions, etc. appreciated.

Tim
Received on Thu Jan 13 2005 - 23:48:52 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:26 UTC