Hi Rick, sorry for top-posting - this is Outlook :-( Attached is the system configuration. I use this more or less unchanged since years. The machine is an 8-core AMD64 with 144GByte memory. The nfsstats -m output for the two file systems I'm testing with is: knopfs01:/OP_UserUnix on /home nfsv3,tcp,resvport,hard,cto,lockd,sec=sys,acdirmin=3,acdirmax=60,acregmin=5,acregmax=60,nametimeo=60,negnametimeo=60,rsize=65536,wsize=65536,readdirsize=65536,readahead=1,wcommitsize=6126856,timeout=120,retrans=2 knopfs01:/op_software on /software nfsv3,tcp,resvport,hard,cto,lockd,sec=sys,acdirmin=3,acdirmax=60,acregmin=5,acregmax=60,nametimeo=60,negnametimeo=60,rsize=65536,wsize=65536,readdirsize=65536,readahead=1,wcommitsize=6126856,timeout=120,retrans=2 I did the tcpdump/wireshark thing and I'm puzzled that I see no readdir requests. I see a lookup, followed by getattr, access and fsstat for the directory and that's it. Looks that even after hours the stuff returned by getdirents(2) comes from the cache. I assume that the NFS client uses getattr to check whether the directory has changed? If I knew what happens when calling getdirents() I could add some debugging printfs() here and there to figure out... harti -----Original Message----- From: Rick Macklem [mailto:rmacklem_at_uoguelph.ca] Sent: Tuesday, May 14, 2013 2:50 PM To: Brandt, Hartmut Cc: current_at_freebsd.org Subject: Re: files disappearing from ls on NFS Hartmut Brandt wrote: > On Mon, 13 May 2013, Rick Macklem wrote: > > RM>Hartmut Brandt wrote: > RM>> On Sun, 12 May 2013, Rick Macklem wrote: > RM>> > RM>> RM>Hartmut Brandt wrote: > RM>> RM>> Hi, > RM>> RM>> > RM>> RM>> I've updated one of my -current machines this week (previous > RM>> update > RM>> RM>> was in > RM>> RM>> february). Now I see a strange effect (it seems only on NFS > RM>> mounts): > RM>> RM>> ls or > RM>> RM>> even echo * will list only some files (strange enough the > first > RM>> files > RM>> RM>> from > RM>> RM>> the normal, alphabetically ordered list). If I change > something > RM>> in the > RM>> RM>> directory (delete a file or create a new one) for some time > the > RM>> RM>> complete > RM>> RM>> listing will appear but after sime time (seconds to a minute > or > RM>> so) > RM>> RM>> again > RM>> RM>> only part of the files is listed. > RM>> RM>> > RM>> RM>> A ktrace on ls /usr/src/lib/libc/gen shows that > getdirentries is > RM>> RM>> called > RM>> RM>> only once (returning 4096). For a full listing getdirentries > is > RM>> called > RM>> RM>> 5 > RM>> RM>> times with the last returning 0. > RM>> RM>> > RM>> RM>> I can still open files that are not listed if I know their > name, > RM>> RM>> though. > RM>> RM>> > RM>> RM>> The NFS server is a Windows 2008 server with an OpenText NFS > RM>> Server > RM>> RM>> which > RM>> RM>> works without problems to all the other FreeBSD machines. > RM>> RM>> > RM>> RM>> So what could that be? > RM>> RM>> > RM>> RM>I've attached a patch that might be worth trying. It is a > "shot in > RM>> the dark", > RM>> RM>but brings the new NFS client's readdir closer to the old one > RM>> (which you > RM>> RM>mentioned still works ok). > RM>> RM> > RM>> RM>Please let me know how it goes, if you have a chance to test > it, > RM>> rick > RM>> > RM>> Hi Rick, > RM>> > RM>> the patch doesn't help. > RM>> > RM>> I wrote a small test program, which opens a directory, calls > RM>> getdents(2) > RM>> in a loop and dumps that. I figured out, that the return of the > system > RM>> call depends on the buffer size I pass to it. The directory has a > RM>> block size of 4k according to fstat(2). If I use that, I get some > RM>> 300 > of the > RM>> almost 500 directory entries. If I use 8k, I get just around 200 > and > RM>> if I > RM>> use 16k I get a handfull. If I dump the buffer in this case I see > RM>> 0x200 > RM>> bytes filled with directory entries, then a lot of zeros and > starting > RM>> from > RM>> 0x1000 again data. This is of course ignored because of the zeros > RM>> before. > RM>> > RM>And for this case getdents(2) returned 16K? It is normal for > getdents(2) > RM>to return less than requested and when end of dir occurs, it should > return 0. > RM> > RM>But if it returns 16K, there shouldn't be zeroed space in the > middle of > RM>it. > RM> > RM>And this always occurs or only after you wait a while? (You noted > in the > RM>above description that it would be ok for a little while after a > directory > RM>change and then would break, which suggests some kind of caching > problem.) > > Today in the morning everything was fine. After waiting 5 minutes, > again only partial directories. When I do a read with 8k buffer size, > getdents(2) returns 8k, but starting from 0x200 until 0x1000 the > buffer is filled with zeros. The entry just before the zeroes ends > exactly at > 0x200 > (that would be the first byte of the next entry) and at 0x1000 a new > entry starts. The rest of the buffer is fine. The next read returns > only 4k and seems to be fine - altough it contains some junk non-zero > bytes in the padding. > Directory entries should never cross DIRBLKSIZ boundaries (512 or 0x200), so it makes sense that one ends at 0x200 and one starts at 0x1000. What doesn't make sense are the 0 bytes in between. One difference between the old and new NFS clients, which the patch I sent you changed to the way the old one does it, is filling in the last block. The old NFS client just leaves the block short and depends on n_direofoffset to recognize it is the last block with b_resid indicating where it ends. For the new client (unless you've applied the patch I emailed you), it fills the rest of the last block in with "empty directories". This was in the OpenBSD code when I did the original NFSv4 stuff and port. I left it in, because I thought it might avoid problems if n_direofoffset was ever bogus. That is why there might be "different junk" at the end of the directory, but it shouldn't matter. It almost sounds like something else is bzero()ing out part of the buffer cache block. Unless the directory has changed, the getdents() after 5 minutes would just return the same buffer cache block that was read in 5 minutes earlier (unless the buffer fell out of the cache and had to be re-read from the server, which would only happen if there was a lot of other file I/O going on during that 5minutes). A couple of comments: - You can run "nfsstat -m" as root to see what the mount it actually configured to use. This might be worth looking at, to see if any of the values are "weird". - One other difference between the old and new NFS clients is the value of NFS_DIRBLKSIZ. For the new one, it is 8K instead of 4K. You could change this in fs/nfs/nfsport.h, where is is defined and then rebuild the sources to see if it has any effect. I can't see why it should matter, but?? - Maybe you could post your system configuration. Someone might spot something that changed in Feb.->Mar. related to your hardware/setup? > Ten minutes later again everything is fine. I tries to spy at the NFS > communication with tcpdump, but it seems unwilling to display > something useful about the NFS. Is it able to decode the readdir > stuff? > To look at NFS packets you need wireshark. You can capture the packets with tcpdump using the -w option. Something like: # tcpdump -s 0 -w file.pcap host server - Then look at file.pcap in wireshark. (Often more convenient than installing wireshark on a particular machine.) If you'd like, you can email me the file.pcap and I can look at it. rick > harti > > _______________________________________________ > freebsd-current_at_freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to > "freebsd-current-unsubscribe_at_freebsd.org"
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:37 UTC