Re: readdir/telldir/seekdir problem (i think)

From: John Baldwin <>
Date: Fri, 24 Apr 2015 10:59:35 -0400
On Friday, April 24, 2015 01:02:39 PM Julian Elischer wrote:
> On 4/23/15 9:54 PM, John Baldwin wrote:
> > On Thursday, April 23, 2015 05:02:08 PM Julian Elischer wrote:
> >> On 4/23/15 11:20 AM, Julian Elischer wrote:
> >>> I'm debugging a problem being seen with samba 3.6.
> >>>
> >>> basically  telldir/seekdir/readdir don't seem to work as advertised..
> >> ok so it looks like readdir() (and friends) is totally broken in the face
> >> of deletes unless you read the entire directory at once or reset to the
> >> the first file before the deletes, or earlier.
> > I'm not sure that Samba isn't assuming non-portable behavior.  For example:
> >
> > >From
> >
> > If a file is removed from or added to the directory after the most recent call
> > to opendir() or rewinddir(), whether a subsequent call to readdir() returns an
> > entry for that file is unspecified.
> >
> > While this doesn't speak directly to your case, it does note that you will
> > get inconsistencies if you scan a directory concurrent with add and remove.
> >
> > UFS might kind of work actually since deletes do not compact the backing
> > directory, but I suspect NFS and ZFS would not work.  In addition, our
> > current NFS support for seekdir is pretty flaky and can't be fixed without
> > changes to return the seek offset for each directory entry (I believe that
> > the projects/ino64 patches include this since they are breaking the ABI of
> > the relevant structures already).  The ABI breakage makes this a very
> > non-trivial task.  However, even if you have that per-item cookie, it is
> > likely meaningless in the face of filesystems that use any sort of more
> > advanced structure than an array (such as trees, etc.) to store directory
> > entries.  POSIX specifically mentions this in the rationale for seekdir:
> >
> >
> > One of the perceived problems of implementation is that returning to a given point in a directory is quite difficult to describe formally, in spite of its intuitive appeal, when systems that use B-trees, hashing functions, or other similar mechanisms to order their directories are considered. The definition of seekdir() and telldir() does not specify whether, when using these interfaces, a given directory entry will be seen at all, or more than once.
> >
> > In fact, given that quote, I would argue that what Samba is doing is
> > non-portable.  This would seem to indicate that a conforming seekdir could
> > just change readdir to immediately return EOF until you call rewinddir.
> how does readdir know that the directory has been changed? fstat?

Undefined.  FreeBSD's libc doesn't cache the entire directory (unless you
are using a union mount), instead it just caches the most recent call to
getdirentries().  It then serves up entries from that block until you hit
the end.  When you hit the end it reads the next block, etc.  When you
call telldir(), the kernel saves the seek offset from the start of the
current block and the offset within that block and returns a cookie to
you.  seekdir() looks up the cookie to find the (seek offset, block offset)
tuple.  If the location matches the directory's current location it doesn't
do anything, otherwise it seeks to the seek offset and reads a new block via
getdirentries().  There is no check for seeing if a directory is
changed.  Instead, you can only be stale by one "block".  When you read
a new block it is relative to the directory's state at that time.

Rick's suggestion of reusing the block for a seek within a block would be
fairly easy to implement, I think it would just be:

Index: head/lib/libc/gen/telldir.c
--- head/lib/libc/gen/telldir.c (revision 281929)
+++ head/lib/libc/gen/telldir.c (working copy)
_at__at_ -101,8 +101,10 _at__at_
        if (lp->loc_loc == dirp->dd_loc && lp->loc_seek == dirp->dd_seek)
-       (void) lseek(dirp->dd_fd, (off_t)lp->loc_seek, SEEK_SET);
-       dirp->dd_seek = lp->loc_seek;
+       if (lp->loc_seek != dirp->dd_seek) {
+               (void) lseek(dirp->dd_fd, (off_t)lp->loc_seek, SEEK_SET);
+               dirp->dd_seek = lp->loc_seek;
+       }
        dirp->dd_loc = 0;
        while (dirp->dd_loc < lp->loc_loc) {
                dp = _readdir_unlocked(dirp, 0);

(You still want to re-parse the block I think since the telldir offset
might be from an earlier version of the block and not point to a valid
directory entry since they are variable sized.)

John Baldwin
Received on Fri Apr 24 2015 - 13:03:36 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:57 UTC