Re: files disappearing from ls on NFS

From: Hartmut Brandt <hartmut.brandt_at_dlr.de>
Date: Wed, 15 May 2013 10:38:38 +0200
On Wed, 15 May 2013, Rick Macklem wrote:

RM>Well, getdents() basically just calls kern_getdirentries() and it calls
RM>VOP_READDIR() { which is called nfs_readdir() in the NFS clients }. 
RM>nfs_readdir() calls ncl_bioread() to do the real work of finding the
RM>buffer cache blocks and copying the data out of them.
RM>One thing you might check via printf()s is whether the buffer cache
RM>has the zero'd data in it before it copies it to userland.

I now dump the data just before the call to vn_io_fault_iomove in 
ncl_bioread(). So what I do:

1. reboot
2. login
3. ls
   -> I see that it is moving 4 blocks 4k each to the user and they look 
      fine
4. wait half an hour
5. ls
   -> now there is only one block, which contains zeros starting from 
      0x200.

Note that I don't do anything else on that machine during that time.

RM>Since you get valid data sometimes and partially zero'd out data others,
RM>I haven't a clue what is going on. One other person reported a problem
RM>when they used a small readdirsize, but it is hard to say they saw the
RM>same thing and no one else seems to be seeing this, so I have no idea
RM>what it might be.
RM>
RM>I remember you started seeing this after an upgrade of current. Do you
RM>happen to have dates (or rNNNNNN) for the old working verion vs the one that broke this?
RM>(All I can think to do is scan the commits that seem to somehow relate
RM> to the buffer cache or copying to userland or ???)

It looks like I had copied the old kernel before installing the new one 
and it is from february 5th. There is no SVN revision in it - looks like 
that feature was added only recently.

harti

RM>
RM>rick
RM>
RM>> harti
RM>> 
RM>> -----Original Message-----
RM>> From: Rick Macklem [mailto:rmacklem_at_uoguelph.ca]
RM>> Sent: Tuesday, May 14, 2013 2:50 PM
RM>> To: Brandt, Hartmut
RM>> Cc: current_at_freebsd.org
RM>> Subject: Re: files disappearing from ls on NFS
RM>> 
RM>> Hartmut Brandt wrote:
RM>> > On Mon, 13 May 2013, Rick Macklem wrote:
RM>> >
RM>> > RM>Hartmut Brandt wrote:
RM>> > RM>> On Sun, 12 May 2013, Rick Macklem wrote:
RM>> > RM>>
RM>> > RM>> RM>Hartmut Brandt wrote:
RM>> > RM>> RM>> Hi,
RM>> > RM>> RM>>
RM>> > RM>> RM>> I've updated one of my -current machines this week
RM>> > (previous
RM>> > RM>> update
RM>> > RM>> RM>> was in
RM>> > RM>> RM>> february). Now I see a strange effect (it seems only on
RM>> > NFS
RM>> > RM>> mounts):
RM>> > RM>> RM>> ls or
RM>> > RM>> RM>> even echo * will list only some files (strange enough the
RM>> > first
RM>> > RM>> files
RM>> > RM>> RM>> from
RM>> > RM>> RM>> the normal, alphabetically ordered list). If I change
RM>> > something
RM>> > RM>> in the
RM>> > RM>> RM>> directory (delete a file or create a new one) for some
RM>> > time
RM>> > the
RM>> > RM>> RM>> complete
RM>> > RM>> RM>> listing will appear but after sime time (seconds to a
RM>> > minute
RM>> > or
RM>> > RM>> so)
RM>> > RM>> RM>> again
RM>> > RM>> RM>> only part of the files is listed.
RM>> > RM>> RM>>
RM>> > RM>> RM>> A ktrace on ls /usr/src/lib/libc/gen shows that
RM>> > getdirentries is
RM>> > RM>> RM>> called
RM>> > RM>> RM>> only once (returning 4096). For a full listing
RM>> > getdirentries
RM>> > is
RM>> > RM>> called
RM>> > RM>> RM>> 5
RM>> > RM>> RM>> times with the last returning 0.
RM>> > RM>> RM>>
RM>> > RM>> RM>> I can still open files that are not listed if I know their
RM>> > name,
RM>> > RM>> RM>> though.
RM>> > RM>> RM>>
RM>> > RM>> RM>> The NFS server is a Windows 2008 server with an OpenText
RM>> > NFS
RM>> > RM>> Server
RM>> > RM>> RM>> which
RM>> > RM>> RM>> works without problems to all the other FreeBSD machines.
RM>> > RM>> RM>>
RM>> > RM>> RM>> So what could that be?
RM>> > RM>> RM>>
RM>> > RM>> RM>I've attached a patch that might be worth trying. It is a
RM>> > "shot in
RM>> > RM>> the dark",
RM>> > RM>> RM>but brings the new NFS client's readdir closer to the old
RM>> > one
RM>> > RM>> (which you
RM>> > RM>> RM>mentioned still works ok).
RM>> > RM>> RM>
RM>> > RM>> RM>Please let me know how it goes, if you have a chance to test
RM>> > it,
RM>> > RM>> rick
RM>> > RM>>
RM>> > RM>> Hi Rick,
RM>> > RM>>
RM>> > RM>> the patch doesn't help.
RM>> > RM>>
RM>> > RM>> I wrote a small test program, which opens a directory, calls
RM>> > RM>> getdents(2)
RM>> > RM>> in a loop and dumps that. I figured out, that the return of the
RM>> > system
RM>> > RM>> call depends on the buffer size I pass to it. The directory has
RM>> > a
RM>> > RM>> block size of 4k according to fstat(2). If I use that, I get
RM>> > some
RM>> > RM>> 300
RM>> > of the
RM>> > RM>> almost 500 directory entries. If I use 8k, I get just around
RM>> > 200
RM>> > and
RM>> > RM>> if I
RM>> > RM>> use 16k I get a handfull. If I dump the buffer in this case I
RM>> > see
RM>> > RM>> 0x200
RM>> > RM>> bytes filled with directory entries, then a lot of zeros and
RM>> > starting
RM>> > RM>> from
RM>> > RM>> 0x1000 again data. This is of course ignored because of the
RM>> > zeros
RM>> > RM>> before.
RM>> > RM>>
RM>> > RM>And for this case getdents(2) returned 16K? It is normal for
RM>> > getdents(2)
RM>> > RM>to return less than requested and when end of dir occurs, it
RM>> > should
RM>> > return 0.
RM>> > RM>
RM>> > RM>But if it returns 16K, there shouldn't be zeroed space in the
RM>> > middle of
RM>> > RM>it.
RM>> > RM>
RM>> > RM>And this always occurs or only after you wait a while? (You noted
RM>> > in the
RM>> > RM>above description that it would be ok for a little while after a
RM>> > directory
RM>> > RM>change and then would break, which suggests some kind of caching
RM>> > problem.)
RM>> >
RM>> > Today in the morning everything was fine. After waiting 5 minutes,
RM>> > again only partial directories. When I do a read with 8k buffer
RM>> > size,
RM>> > getdents(2) returns 8k, but starting from 0x200 until 0x1000 the
RM>> > buffer is filled with zeros. The entry just before the zeroes ends
RM>> > exactly at
RM>> > 0x200
RM>> > (that would be the first byte of the next entry) and at 0x1000 a new
RM>> > entry starts. The rest of the buffer is fine. The next read returns
RM>> > only 4k and seems to be fine - altough it contains some junk
RM>> > non-zero
RM>> > bytes in the padding.
RM>> >
RM>> Directory entries should never cross DIRBLKSIZ boundaries (512 or
RM>> 0x200), so it makes sense that one ends at 0x200 and one starts at
RM>> 0x1000. What doesn't make sense are the 0 bytes in between.
RM>> 
RM>> One difference between the old and new NFS clients, which the patch I
RM>> sent you changed to the way the old one does it, is filling in the
RM>> last block.
RM>> The old NFS client just leaves the block short and depends on
RM>> n_direofoffset to recognize it is the last block with b_resid
RM>> indicating where it ends.
RM>> For the new client (unless you've applied the patch I emailed you), it
RM>> fills the rest of the last block in with "empty directories". This was
RM>> in the OpenBSD code when I did the original NFSv4 stuff and port. I
RM>> left it in, because I thought it might avoid problems if
RM>> n_direofoffset was ever bogus. That is why there might be "different
RM>> junk" at the end of the directory, but it shouldn't matter.
RM>> 
RM>> It almost sounds like something else is bzero()ing out part of the
RM>> buffer cache block. Unless the directory has changed, the getdents()
RM>> after 5 minutes would just return the same buffer cache block that was
RM>> read in 5 minutes earlier (unless the buffer fell out of the cache and
RM>> had to be re-read from the server, which would only happen if there
RM>> was a lot of other file I/O going on during that 5minutes).
RM>> 
RM>> A couple of comments:
RM>> - You can run "nfsstat -m" as root to see what the mount it actually
RM>> configured to use. This might be worth looking at, to see if any
RM>> of the values are "weird".
RM>> - One other difference between the old and new NFS clients is the
RM>> value of NFS_DIRBLKSIZ. For the new one, it is 8K instead of 4K.
RM>> You could change this in fs/nfs/nfsport.h, where is is defined
RM>> and then rebuild the sources to see if it has any effect. I can't
RM>> see why it should matter, but??
RM>> - Maybe you could post your system configuration. Someone might spot
RM>> something that changed in Feb.->Mar. related to your hardware/setup?
RM>> 
RM>> > Ten minutes later again everything is fine. I tries to spy at the
RM>> > NFS
RM>> > communication with tcpdump, but it seems unwilling to display
RM>> > something useful about the NFS. Is it able to decode the readdir
RM>> > stuff?
RM>> >
RM>> To look at NFS packets you need wireshark. You can capture the packets
RM>> with tcpdump using the -w option. Something like:
RM>> # tcpdump -s 0 -w file.pcap host server
RM>> - Then look at file.pcap in wireshark. (Often more convenient than
RM>> installing wireshark on a particular machine.) If you'd like, you can
RM>> email me the file.pcap and I can look at it.
RM>> 
RM>> rick
RM>> 
RM>> > harti
RM>> >
RM>> > _______________________________________________
RM>> > freebsd-current_at_freebsd.org mailing list
RM>> > http://lists.freebsd.org/mailman/listinfo/freebsd-current
RM>> > To unsubscribe, send any mail to
RM>> > "freebsd-current-unsubscribe_at_freebsd.org"
RM>> 
RM>> _______________________________________________
RM>> freebsd-current_at_freebsd.org mailing list
RM>> http://lists.freebsd.org/mailman/listinfo/freebsd-current
RM>> To unsubscribe, send any mail to
RM>> "freebsd-current-unsubscribe_at_freebsd.org"
RM>
Received on Wed May 15 2013 - 06:38:50 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:37 UTC