Re: readdir/telldir/seekdir problem (i think)

From: Rick Macklem <>
Date: Fri, 24 Apr 2015 21:39:31 -0400 (EDT)
Jilles Tjoelker wrote:
> On Fri, Apr 24, 2015 at 04:28:12PM -0400, John Baldwin wrote:
> > Yes, this isn't at all safe.  There's no guarantee whatsoever that
> > the offset on the directory fd that isn't something returned by
> > getdirentries has any meaning.  In particular, the size of the
> > directory entry in a random filesystem might be a different size
> > than the structure returned by getdirentries (since it converts
> > things into a FS-independent format).
> > This might work for UFS by accident, but this is probably why ZFS
> > doesn't work.
> > However, this might be properly fixed by the thing that ino64 is
> > doing where each directory entry returned by getdirentries gives
> > you a seek offset that you _can_ directly seek to (as opposed to
> > seeking to the start of the block and then walking forward N
> > entries until you get an inter-block entry that is the same).
> The ino64 branch only reserves space for d_off and does not use it in
> any way. This is appropriate since actually using d_off is a major
> feature addition.
Well, at some point ino64 will need to define a new getdirentries(2)
syscall and I believe this new syscall can have different/additional

I'd suggest that the new gtedirentries(2) syscall should return a
flag to indicate that the underlying file system is filling in d_off.
Then the libc functions can use d_off if it it available.
(They will still need to "work" at least as well as they do now if
 the file system doesn't support d_off. The old getdirentries(2) syscall
 will be returning the old/current "struct dirent" which doesn't have
 the field anyhow.)

Another bit of fun is that the argument for seekdir()/telldir() is a
long and ends up 32bits for some arches. d_off is 64bits, since that
is what some file systems require.
Maybe the library code can only use d_off if it is a 64bit arch and
the file system is filling it in. (Or maybe the library can keep track
of 32<->64bit mappings for the offsets. I haven't looked at the libc
functions for a while, so I can't remember what they keep track of.)


> A proper d_off would still be useful even if UFS's readdir keeps
> masking
> off the offset so a directory read always starts at the beginning of
> a
> 512-byte directory block, since this allows more distinct offset
> values
> than safely using getdirentries()'s *basep. With d_off, one outer
> loop
> must read at least one directory block to avoid spinning
> indefinitely,
> while using getdirentries()'s *basep requires reading the whole
> getdirentries() buffer.
> Some Linux filesystems go further and provide a unique d_off for each
> entry.
> Another idea would be to store the last d_ino instead of dd_loc into
> the
> struct ddloc. On seekdir(), this would seek to loc_seek as before and
> skip entries until that d_ino is found, or to the start of the buffer
> if
> not found (and possibly return some entries again that should not be
> returned, but Samba copes with that).
> --
> Jilles Tjoelker
> _______________________________________________
> mailing list
> To unsubscribe, send any mail to
> ""
Received on Fri Apr 24 2015 - 23:39:39 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:57 UTC