Re: Directories with 2million files

From: Robert Watson <rwatson_at_freebsd.org>
Date: Thu, 22 Apr 2004 16:31:39 -0400 (EDT)
On Wed, 21 Apr 2004, Eric Anderson wrote:

> First, let me say that I am impressed (but not shocked) - FreeBSD
> quietly handled my building of a directory with 2055476 files in it. 
> I'm not sure if there is a limit to this number, but at least we know it
> works to 2million.  I'm running 5.2.1-RELEASE. 
> 
> However, several tools seem to choke on that many files - mainly ls and 
> du.  Find works just fine.  Here's what my directory looks like (from 
> the parent):

Directories with millions of entries turn up surprisingly frequently,
actually.  While FreeBSD handles them quite well, they're something that's
not frequently optimized for in applications:

cyrus# /usr/bin/time \ls -f | wc
        1.86 real         1.20 user         0.34 sys
  338806  338806 2599362
cyrus# /usr/bin/time \ls | wc
        6.48 real         4.39 user         0.28 sys
  338807  338807 2599370

> I'd work on some patches, but I'm not worth much when it comes to C/C++. 

Unfortunately, a lot of this has to do with the desire to have programs
behave nicely in ways that scale well only to a limited extent.  I.e.,
sorting and sizing of output.  If you have algorithms that require all
elements in a large array be in memory, such as sorting algorithms, it's
inevitably going to hurt.  And with text applications designed to run in
command pipelines, to POSIX specs, etc, there isn't a whole lot of room to
generate warnings like:

  cyrus# ls
  ls: Holy cow, you have a lot of files.  You might want to disable sorting.
  ...

> If someone has some patches, or code to try, let me know - I'd be more
> than willing to test, possibly even give out an account on the machine. 

Efficiency improvements will generally always be welcome, as long as
they're correct and don't overly complicate the implementation.  For what
it's worth, I've noticed a lot of tools are getting better about handling
large numbers of (whatevers).  For example, when I pointed Mozilla at an
IMAP mail folder with 100,000 messages in it, it would reread the mailbox
index every 60 seconds if there was a mailbox change.  If you add one
message to the mailbox a minute, it will never stop rereading the index if
it takes over 59 seconds to read the index, which over a WAN it would. 
Recent versions are *much* smarter, and appear in many cases to scale to
millions of messages, which is what I keep in my large directories :-).

Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
robert_at_fledge.watson.org      Senior Research Scientist, McAfee Research
Received on Thu Apr 22 2004 - 11:31:57 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:52 UTC