Re: files disappearing from ls on NFS

From: Rick Macklem <rmacklem_at_uoguelph.ca>
Date: Tue, 28 May 2013 19:39:38 -0400 (EDT)
Hartmut Brandt wrote:
> On Wed, 15 May 2013, Rick Macklem wrote:
> 
> RM>Well, getdents() basically just calls kern_getdirentries() and it
> calls
> RM>VOP_READDIR() { which is called nfs_readdir() in the NFS clients }.
> RM>nfs_readdir() calls ncl_bioread() to do the real work of finding
> the
> RM>buffer cache blocks and copying the data out of them.
> RM>One thing you might check via printf()s is whether the buffer cache
> RM>has the zero'd data in it before it copies it to userland.
> 
> I now dump the data just before the call to vn_io_fault_iomove in
> ncl_bioread(). So what I do:
> 
> 1. reboot
> 2. login
> 3. ls
> -> I see that it is moving 4 blocks 4k each to the user and they look
> fine
> 4. wait half an hour
> 5. ls
> -> now there is only one block, which contains zeros starting from
> 0x200.
> 
> Note that I don't do anything else on that machine during that time.
> 
> RM>Since you get valid data sometimes and partially zero'd out data
> others,
> RM>I haven't a clue what is going on. One other person reported a
> problem
> RM>when they used a small readdirsize, but it is hard to say they saw
> the
> RM>same thing and no one else seems to be seeing this, so I have no
> idea
> RM>what it might be.
> RM>
> RM>I remember you started seeing this after an upgrade of current. Do
> you
> RM>happen to have dates (or rNNNNNN) for the old working verion vs the
> one that broke this?
> RM>(All I can think to do is scan the commits that seem to somehow
> relate
> RM> to the buffer cache or copying to userland or ???)
> 
> It looks like I had copied the old kernel before installing the new
> one
> and it is from february 5th. There is no SVN revision in it - looks
> like
> that feature was added only recently.
> 
> harti
> 
Thanks to Hartmut's testing, a patch to fix this problem has just
been committed to head as r251079. The problem was caused by
vnode_pager_setsize() being called for directories (where the
size reported by the server can be smaller than the size of the
ufs-like directory created in the client from the RPCs XDR).

r251079 will be MFC'd to stable/9 in 1 week if things go smoothly.

You might see this problem for head kernels between r248567-r251078
and stable/9 kernels from r249078 (Apr. 4) until a week from now.

Sorry for any inconvenience and thanks go to Hartmut for his help
isolating this, rick

> RM>
> RM>rick
> RM>
> RM>> harti
> RM>>
> RM>> -----Original Message-----
> RM>> From: Rick Macklem [mailto:rmacklem_at_uoguelph.ca]
> RM>> Sent: Tuesday, May 14, 2013 2:50 PM
> RM>> To: Brandt, Hartmut
> RM>> Cc: current_at_freebsd.org
> RM>> Subject: Re: files disappearing from ls on NFS
> RM>>
> RM>> Hartmut Brandt wrote:
> RM>> > On Mon, 13 May 2013, Rick Macklem wrote:
> RM>> >
> RM>> > RM>Hartmut Brandt wrote:
> RM>> > RM>> On Sun, 12 May 2013, Rick Macklem wrote:
> RM>> > RM>>
> RM>> > RM>> RM>Hartmut Brandt wrote:
> RM>> > RM>> RM>> Hi,
> RM>> > RM>> RM>>
> RM>> > RM>> RM>> I've updated one of my -current machines this week
> RM>> > (previous
> RM>> > RM>> update
> RM>> > RM>> RM>> was in
> RM>> > RM>> RM>> february). Now I see a strange effect (it seems only
> on
> RM>> > NFS
> RM>> > RM>> mounts):
> RM>> > RM>> RM>> ls or
> RM>> > RM>> RM>> even echo * will list only some files (strange enough
> the
> RM>> > first
> RM>> > RM>> files
> RM>> > RM>> RM>> from
> RM>> > RM>> RM>> the normal, alphabetically ordered list). If I change
> RM>> > something
> RM>> > RM>> in the
> RM>> > RM>> RM>> directory (delete a file or create a new one) for
> some
> RM>> > time
> RM>> > the
> RM>> > RM>> RM>> complete
> RM>> > RM>> RM>> listing will appear but after sime time (seconds to a
> RM>> > minute
> RM>> > or
> RM>> > RM>> so)
> RM>> > RM>> RM>> again
> RM>> > RM>> RM>> only part of the files is listed.
> RM>> > RM>> RM>>
> RM>> > RM>> RM>> A ktrace on ls /usr/src/lib/libc/gen shows that
> RM>> > getdirentries is
> RM>> > RM>> RM>> called
> RM>> > RM>> RM>> only once (returning 4096). For a full listing
> RM>> > getdirentries
> RM>> > is
> RM>> > RM>> called
> RM>> > RM>> RM>> 5
> RM>> > RM>> RM>> times with the last returning 0.
> RM>> > RM>> RM>>
> RM>> > RM>> RM>> I can still open files that are not listed if I know
> their
> RM>> > name,
> RM>> > RM>> RM>> though.
> RM>> > RM>> RM>>
> RM>> > RM>> RM>> The NFS server is a Windows 2008 server with an
> OpenText
> RM>> > NFS
> RM>> > RM>> Server
> RM>> > RM>> RM>> which
> RM>> > RM>> RM>> works without problems to all the other FreeBSD
> machines.
> RM>> > RM>> RM>>
> RM>> > RM>> RM>> So what could that be?
> RM>> > RM>> RM>>
> RM>> > RM>> RM>I've attached a patch that might be worth trying. It is
> a
> RM>> > "shot in
> RM>> > RM>> the dark",
> RM>> > RM>> RM>but brings the new NFS client's readdir closer to the
> old
> RM>> > one
> RM>> > RM>> (which you
> RM>> > RM>> RM>mentioned still works ok).
> RM>> > RM>> RM>
> RM>> > RM>> RM>Please let me know how it goes, if you have a chance to
> test
> RM>> > it,
> RM>> > RM>> rick
> RM>> > RM>>
> RM>> > RM>> Hi Rick,
> RM>> > RM>>
> RM>> > RM>> the patch doesn't help.
> RM>> > RM>>
> RM>> > RM>> I wrote a small test program, which opens a directory,
> calls
> RM>> > RM>> getdents(2)
> RM>> > RM>> in a loop and dumps that. I figured out, that the return
> of the
> RM>> > system
> RM>> > RM>> call depends on the buffer size I pass to it. The
> directory has
> RM>> > a
> RM>> > RM>> block size of 4k according to fstat(2). If I use that, I
> get
> RM>> > some
> RM>> > RM>> 300
> RM>> > of the
> RM>> > RM>> almost 500 directory entries. If I use 8k, I get just
> around
> RM>> > 200
> RM>> > and
> RM>> > RM>> if I
> RM>> > RM>> use 16k I get a handfull. If I dump the buffer in this
> case I
> RM>> > see
> RM>> > RM>> 0x200
> RM>> > RM>> bytes filled with directory entries, then a lot of zeros
> and
> RM>> > starting
> RM>> > RM>> from
> RM>> > RM>> 0x1000 again data. This is of course ignored because of
> the
> RM>> > zeros
> RM>> > RM>> before.
> RM>> > RM>>
> RM>> > RM>And for this case getdents(2) returned 16K? It is normal for
> RM>> > getdents(2)
> RM>> > RM>to return less than requested and when end of dir occurs, it
> RM>> > should
> RM>> > return 0.
> RM>> > RM>
> RM>> > RM>But if it returns 16K, there shouldn't be zeroed space in
> the
> RM>> > middle of
> RM>> > RM>it.
> RM>> > RM>
> RM>> > RM>And this always occurs or only after you wait a while? (You
> noted
> RM>> > in the
> RM>> > RM>above description that it would be ok for a little while
> after a
> RM>> > directory
> RM>> > RM>change and then would break, which suggests some kind of
> caching
> RM>> > problem.)
> RM>> >
> RM>> > Today in the morning everything was fine. After waiting 5
> minutes,
> RM>> > again only partial directories. When I do a read with 8k buffer
> RM>> > size,
> RM>> > getdents(2) returns 8k, but starting from 0x200 until 0x1000
> the
> RM>> > buffer is filled with zeros. The entry just before the zeroes
> ends
> RM>> > exactly at
> RM>> > 0x200
> RM>> > (that would be the first byte of the next entry) and at 0x1000
> a new
> RM>> > entry starts. The rest of the buffer is fine. The next read
> returns
> RM>> > only 4k and seems to be fine - altough it contains some junk
> RM>> > non-zero
> RM>> > bytes in the padding.
> RM>> >
> RM>> Directory entries should never cross DIRBLKSIZ boundaries (512 or
> RM>> 0x200), so it makes sense that one ends at 0x200 and one starts
> at
> RM>> 0x1000. What doesn't make sense are the 0 bytes in between.
> RM>>
> RM>> One difference between the old and new NFS clients, which the
> patch I
> RM>> sent you changed to the way the old one does it, is filling in
> the
> RM>> last block.
> RM>> The old NFS client just leaves the block short and depends on
> RM>> n_direofoffset to recognize it is the last block with b_resid
> RM>> indicating where it ends.
> RM>> For the new client (unless you've applied the patch I emailed
> you), it
> RM>> fills the rest of the last block in with "empty directories".
> This was
> RM>> in the OpenBSD code when I did the original NFSv4 stuff and port.
> I
> RM>> left it in, because I thought it might avoid problems if
> RM>> n_direofoffset was ever bogus. That is why there might be
> "different
> RM>> junk" at the end of the directory, but it shouldn't matter.
> RM>>
> RM>> It almost sounds like something else is bzero()ing out part of
> the
> RM>> buffer cache block. Unless the directory has changed, the
> getdents()
> RM>> after 5 minutes would just return the same buffer cache block
> that was
> RM>> read in 5 minutes earlier (unless the buffer fell out of the
> cache and
> RM>> had to be re-read from the server, which would only happen if
> there
> RM>> was a lot of other file I/O going on during that 5minutes).
> RM>>
> RM>> A couple of comments:
> RM>> - You can run "nfsstat -m" as root to see what the mount it
> actually
> RM>> configured to use. This might be worth looking at, to see if any
> RM>> of the values are "weird".
> RM>> - One other difference between the old and new NFS clients is the
> RM>> value of NFS_DIRBLKSIZ. For the new one, it is 8K instead of 4K.
> RM>> You could change this in fs/nfs/nfsport.h, where is is defined
> RM>> and then rebuild the sources to see if it has any effect. I can't
> RM>> see why it should matter, but??
> RM>> - Maybe you could post your system configuration. Someone might
> spot
> RM>> something that changed in Feb.->Mar. related to your
> hardware/setup?
> RM>>
> RM>> > Ten minutes later again everything is fine. I tries to spy at
> the
> RM>> > NFS
> RM>> > communication with tcpdump, but it seems unwilling to display
> RM>> > something useful about the NFS. Is it able to decode the
> readdir
> RM>> > stuff?
> RM>> >
> RM>> To look at NFS packets you need wireshark. You can capture the
> packets
> RM>> with tcpdump using the -w option. Something like:
> RM>> # tcpdump -s 0 -w file.pcap host server
> RM>> - Then look at file.pcap in wireshark. (Often more convenient
> than
> RM>> installing wireshark on a particular machine.) If you'd like, you
> can
> RM>> email me the file.pcap and I can look at it.
> RM>>
> RM>> rick
> RM>>
> RM>> > harti
> RM>> >
> RM>> > _______________________________________________
> RM>> > freebsd-current_at_freebsd.org mailing list
> RM>> > http://lists.freebsd.org/mailman/listinfo/freebsd-current
> RM>> > To unsubscribe, send any mail to
> RM>> > "freebsd-current-unsubscribe_at_freebsd.org"
> RM>>
> RM>> _______________________________________________
> RM>> freebsd-current_at_freebsd.org mailing list
> RM>> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> RM>> To unsubscribe, send any mail to
> RM>> "freebsd-current-unsubscribe_at_freebsd.org"
> RM>
> _______________________________________________
> freebsd-current_at_freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to
> "freebsd-current-unsubscribe_at_freebsd.org"
Received on Tue May 28 2013 - 21:39:46 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:38 UTC