Re: you are in an fs with millions of small files

From: Charles Swiger <cswiger_at_mac.com> Date: Wed, 8 Jun 2005 12:06:32 -0400 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:36 UTC

On Jun 8, 2005, at 3:52 AM, Colin Percival wrote:
> Giorgos Keramidas wrote:
>> On 2005-06-08 09:25, Dag-Erling Sm?rgrav <des_at_des.no> wrote:
>>> That's because fts's sorting code is brain-dead.  It starts by  
>>> reading
>>> the entire directory into a linked list, then copies that list  
>>> into an
>>> array which it passes to qsort(), and finally converts the array  
>>> back
>>> into a linked list.
>>
>> Is there a better way to sort a linked list
>
> How do you define "better"?

Most people think "better" means "it runs faster".  :-)

> You can merge-sort a singly-linked list quite easily, but converting
> it to an array and back would probably be faster.

Ugh.  Converting a list to an array and back simply to sort requires  
roughly twice the working memory, and the work you're doing to  
accomplish this conversion is time spent not actually solving  
anything useful.

It's best to pick one data structure based on what the task requires  
and what the access patterns are.  If you know that the number of  
elements is fixed and will never change, arrays are fine, but if the  
# of elements changes dynamicly, something like a list or red-black  
tree is a better bet.  A list is more suitable for linear traversal,  
whereas a RB-tree handles random access  better.

qsort() is wonderful, but as the size of individual data elements  
grows, the overhead of copying them around in the array becomes large  
enough that you are better off rearranging list pointers rather than  
using an array representation.  A long time ago, the break-even point  
for using lists rather than arrays was somewhere around sizeof(el) >  
64, but it's been a while....

-- 
-Chuck