Re: files disappearing from ls on NFS

From: Rick Macklem <rmacklem_at_uoguelph.ca>
Date: Mon, 6 May 2013 08:53:24 -0400 (EDT)
Hartmut Brandt wrote:
> Hi Rick,
> 
> the patch doesn't help. So how can I help to fix that? Of course, I
> can use the work-around with oldnfs, but ...
> 
Well, I plan on going through the readdir code and seeing if I can spot
a case that would break for small RPC replies. If I can find something,
I'll email you a patch for testing. (I can't seem to reproduce the problem
here.)

The mysterious part for me is why it has shown up recently, because there
hasn't been any recent change committed that seems like it could cause this.
(Maybe it is just a co-incidence that it showed up recently and the bug has
 been there all along?)

I'll admit my worst fear is that is somehow caused by the switch to clang for
certain arches. If that is the case, it could take a long time to isolate.

rick
> harti
> 
> -----Original Message-----
> From: Rick Macklem [mailto:rmacklem_at_uoguelph.ca]
> Sent: Saturday, May 04, 2013 11:33 PM
> To: Brandt, Hartmut
> Cc: current_at_freebsd.org; Andrzej Tobola
> Subject: Re: files disappearing from ls on NFS
> 
> Hartmut Brandt wrote:
> > On Fri, 3 May 2013, Rick Macklem wrote:
> >
> > RM>Ok, if you succeed in isolating the commit, that would be great.
> >
> > Hmm. I'm somewhat stuck. clang from yesterday can't compile clang
> > from
> > a month ago...
> >
> > harti
> >
> Oh well. You could try this patch (which is the one to fix readdir for
> union mounts), since I can see that VOP_VPTOCNP() will also be broken
> without it. (I can't see how that would break "ls", but it breaks
> __getcwd() and friends, so maybe it can affect "ls" somehow?)
> 
> It's a cut/paste under windows, so I'm afraid the whitespace will be
> messed up, but it's pretty simple to apply by hand.
> 
> Index: nfs_clvnops.c
> ===================================================================
> --- nfs_clvnops.c (revision 249568)
> +++ nfs_clvnops.c (working copy)
> _at__at_ -2221,6 +2221,7 _at__at_
> !NFS_TIMESPEC_COMPARE(&np->n_mtime, &vattr.va_mtime)) {
> mtx_unlock(&np->n_mtx);
> NFSINCRGLOBAL(newnfsstats.direofcache_hits);
> + *ap->a_eofflag = 1;
> return (0);
> } else
> mtx_unlock(&np->n_mtx); _at__at_ -2233,8 +2234,10 _at__at_
> tresid = uio->uio_resid;
> error = ncl_bioread(vp, uio, 0, ap->a_cred);
> 
> - if (!error && uio->uio_resid == tresid)
> + if (!error && uio->uio_resid == tresid) {
> NFSINCRGLOBAL(newnfsstats.direofcache_misses);
> + *ap->a_eofflag = 1;
> + }
> return (error);
> }
> 
> I haven't yet succeeded in reproducing the problem, but will be poking
> at it some more, rick
> 
> > RM>
> > RM>rick
> > RM>
> > RM>> harti
> > RM>>
> > RM>> On Fri, 3 May 2013, Rick Macklem wrote:
> > RM>>
> > RM>> RM>Hartmut Brandt wrote:
> > RM>> RM>> Hi,
> > RM>> RM>>
> > RM>> RM>> I've updated one of my -current machines this week
> > (previous
> > RM>> update
> > RM>> RM>> was in
> > RM>> RM>> february). Now I see a strange effect (it seems only on
> > NFS
> > RM>> mounts):
> > RM>> RM>> ls or
> > RM>> RM>> even echo * will list only some files (strange enough the
> > first
> > RM>> files
> > RM>> RM>> from
> > RM>> RM>> the normal, alphabetically ordered list). If I change
> > something
> > RM>> in the
> > RM>> RM>> directory (delete a file or create a new one) for some
> > time
> > the
> > RM>> RM>> complete
> > RM>> RM>> listing will appear but after sime time (seconds to a
> > minute
> > or
> > RM>> so)
> > RM>> RM>> again
> > RM>> RM>> only part of the files is listed.
> > RM>> RM>>
> > RM>> RM>> A ktrace on ls /usr/src/lib/libc/gen shows that
> > getdirentries is
> > RM>> RM>> called
> > RM>> RM>> only once (returning 4096). For a full listing
> > getdirentries
> > is
> > RM>> called
> > RM>> RM>> 5
> > RM>> RM>> times with the last returning 0.
> > RM>> RM>>
> > RM>> RM>> I can still open files that are not listed if I know their
> > name,
> > RM>> RM>> though.
> > RM>> RM>>
> > RM>> RM>> The NFS server is a Windows 2008 server with an OpenText
> > NFS
> > RM>> Server
> > RM>> RM>> which
> > RM>> RM>> works without problems to all the other FreeBSD machines.
> > RM>> RM>>
> > RM>> RM>> So what could that be?
> > RM>> RM>>
> > RM>> RM>Someone else reported missing files returned via "ls"
> > recently,
> > RM>> when
> > RM>> RM>they used a small readdirsize (below 8K). I haven't yet had
> > a
> > RM>> change to try
> > RM>> RM>and reproduce it or do any snooping around.
> > RM>> RM>
> > RM>> RM>There haven't been any recent changes to readdir in the NFS
> > client,
> > RM>> RM>except a trivial one that adds a check for vnode type being
> > VDIR,
> > RM>> RM>so I don't see that it can be a recent NFS change.
> > RM>> RM>
> > RM>> RM>If you can increase the readdirsize, try that to see if it
> > avoids
> > RM>> RM>the problem. "nfsstat -m" shows you what the mount options
> > end
> > up
> > RM>> RM>being after doing the mount. The server might be limiting
> > the
> > RM>> readdirsize
> > RM>> RM>to 4K, so you should check, even if you specify a large
> > value
> > for
> > RM>> RM>the mount.
> > RM>> RM>
> > RM>> RM>rick
> > RM>> RM>
> > RM>> RM>> Regards,
> > RM>> RM>> harti
> > RM>> RM>> _______________________________________________
> > RM>> RM>> freebsd-current_at_freebsd.org mailing list
> > RM>> RM>> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> > RM>> RM>> To unsubscribe, send any mail to
> > RM>> RM>> "freebsd-current-unsubscribe_at_freebsd.org"
> > RM>> RM>
> > RM>> _______________________________________________
> > RM>> freebsd-current_at_freebsd.org mailing list
> > RM>> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> > RM>> To unsubscribe, send any mail to
> > RM>> "freebsd-current-unsubscribe_at_freebsd.org"
> > RM>
> > _______________________________________________
> > freebsd-current_at_freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-current
> > To unsubscribe, send any mail to
> > "freebsd-current-unsubscribe_at_freebsd.org"
Received on Mon May 06 2013 - 10:53:48 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:37 UTC