Re: files disappearing from ls on NFS

From: Hartmut Brandt <hartmut.brandt_at_dlr.de>
Date: Tue, 14 May 2013 09:14:41 +0200
On Mon, 13 May 2013, Rick Macklem wrote:

RM>Hartmut Brandt wrote:
RM>> On Sun, 12 May 2013, Rick Macklem wrote:
RM>> 
RM>> RM>Hartmut Brandt wrote:
RM>> RM>> Hi,
RM>> RM>>
RM>> RM>> I've updated one of my -current machines this week (previous
RM>> update
RM>> RM>> was in
RM>> RM>> february). Now I see a strange effect (it seems only on NFS
RM>> mounts):
RM>> RM>> ls or
RM>> RM>> even echo * will list only some files (strange enough the first
RM>> files
RM>> RM>> from
RM>> RM>> the normal, alphabetically ordered list). If I change something
RM>> in the
RM>> RM>> directory (delete a file or create a new one) for some time the
RM>> RM>> complete
RM>> RM>> listing will appear but after sime time (seconds to a minute or
RM>> so)
RM>> RM>> again
RM>> RM>> only part of the files is listed.
RM>> RM>>
RM>> RM>> A ktrace on ls /usr/src/lib/libc/gen shows that getdirentries is
RM>> RM>> called
RM>> RM>> only once (returning 4096). For a full listing getdirentries is
RM>> called
RM>> RM>> 5
RM>> RM>> times with the last returning 0.
RM>> RM>>
RM>> RM>> I can still open files that are not listed if I know their name,
RM>> RM>> though.
RM>> RM>>
RM>> RM>> The NFS server is a Windows 2008 server with an OpenText NFS
RM>> Server
RM>> RM>> which
RM>> RM>> works without problems to all the other FreeBSD machines.
RM>> RM>>
RM>> RM>> So what could that be?
RM>> RM>>
RM>> RM>I've attached a patch that might be worth trying. It is a "shot in
RM>> the dark",
RM>> RM>but brings the new NFS client's readdir closer to the old one
RM>> (which you
RM>> RM>mentioned still works ok).
RM>> RM>
RM>> RM>Please let me know how it goes, if you have a chance to test it,
RM>> rick
RM>> 
RM>> Hi Rick,
RM>> 
RM>> the patch doesn't help.
RM>> 
RM>> I wrote a small test program, which opens a directory, calls
RM>> getdents(2)
RM>> in a loop and dumps that. I figured out, that the return of the system
RM>> call depends on the buffer size I pass to it. The directory has a
RM>> block
RM>> size of 4k according to fstat(2). If I use that, I get some 300 of the
RM>> almost 500 directory entries. If I use 8k, I get just around 200 and
RM>> if I
RM>> use 16k I get a handfull. If I dump the buffer in this case I see
RM>> 0x200
RM>> bytes filled with directory entries, then a lot of zeros and starting
RM>> from
RM>> 0x1000 again data. This is of course ignored because of the zeros
RM>> before.
RM>> 
RM>And for this case getdents(2) returned 16K? It is normal for getdents(2)
RM>to return less than requested and when end of dir occurs, it should return 0.
RM>
RM>But if it returns 16K, there shouldn't be zeroed space in the middle of
RM>it.
RM>
RM>And this always occurs or only after you wait a while? (You noted in the
RM>above description that it would be ok for a little while after a directory
RM>change and then would break, which suggests some kind of caching problem.)

Today in the morning everything was fine. After waiting 5 minutes, again 
only partial directories. When I do a read with 8k buffer size, 
getdents(2) returns 8k, but starting from 0x200 until 0x1000 the buffer is 
filled with zeros. The entry just before the zeroes ends exactly at 0x200 
(that would be the first byte of the next entry) and at 0x1000 a new entry 
starts. The rest of the buffer is fine. The next read returns only 4k and 
seems to be fine - altough it contains some junk non-zero bytes in the 
padding.

Ten minutes later again everything is fine. I tries to spy at the NFS 
communication with tcpdump, but it seems unwilling to display something 
useful about the NFS. Is it able to decode the readdir stuff?

harti
Received on Tue May 14 2013 - 05:15:01 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:37 UTC