On Tue, Mar 21, 2017 at 02:30:38AM +0000, Rick Macklem wrote: > Konstantin Belousov wrote: > [stuff snipped] > > Yes, I have to somewhat retract my claims, but then I have another set > > of surprises. > Righto. > > > I realized (remembered) that nfs has its own VOP_PUTPAGES() method. > > Implementation seems to directly initiate write RPC request using the > > pages as the source buffer. I do not see anything in the code which > > would mark the buffers, which possibly contain the pages, as clean, > > or mark the buffer range as undirty. > The only place I know of that the code does this is in the "struct buf's" > hanging off of v_bufobj.bo_dirty. > I imagine there would be a race between the write-back to the NFS server > and further changes to the page by the process. For the most part, the > VOP_PUTPAGES() is likely to happen after the process is done modifying > the pages (often exited). For cases where it happens sooner, I would expect > the page(s) to be written multiple times, but the last write should bring > the file "up to date" on the server. > > > At very least, this might cause unnecessary double-write of the same > > data. I am not sure if it could cause coherency issues between data > > written using mappings and write(2). Also, both vm_object_page_clean() > > and vfs_busy_pages() only ensure the shared-busy state on the pages, > > so write(2) can occur while pageout sends data to server, causing > > 'impossible' content transmitted over the wire. > I'm not sure what you mean by impossible content, but for NFS the only > time the file on the NFS server should be "up to date" will be after a file > doing write(2) writing has closed the fd (and only then if options like > "nocto" has not been used) or after an fsync(2) done by the process > doing the writing. By 'impossible' I mean some arbitrary combination of bytes which were written by many means to the file at arbitrary moments. In other words, the file content, or even a single page/block content is not atomic WRT the client updates. > For mmap'd writing, I think msync(2) is about the only > thing the process can do to ensure the data is written back to the server. > (There was a patch to the NFS VOP_CLOSE() that does a vm_object_page_clean() > but without the OBJPC_SYNC flag which tries to make sure the pages get written > shortly after the file is closed. Of course, an mmap'd file can still be modified by the > process after close(2), so it is "just an attempt to make the common case work". > I don't recall, but I don't think I was the author of this patch.) > > I also wouldn't be surprised that multiple writes of the same page(s) occurs > under certain situations. (NFS has no rules w.r.t. write ordering. Each RPC is > separate and simply writes N bytes at file offset S.) It definitely happens when > there are multiple write(2)s of partial buffers, depending on when a sync() happens. > > > Could you, please, explain the reasons for such implementation of > > ncl_putpage() ? > Well, I wasn't the author (it was just cribbed from the old NFS client and I don't > know who wrote it), so I'm afraid I don't know. (It's code I avoid changing because I don't > really understand it.) > > I suspect that the author assumed that processes would either mmap the file > or use write(2) and wouldn't ever try and mix them to-gether. Or, what seems more likely to me, the code was written on a system where buffer cache and page queues are not coherent. Anyway, my position is that nfs VOP_PUTPAGES() should do write through buffer cache, not issuing the direct rpc call with the pages as source. Then your patch would need an update with the mentioned call to ncl_flush().Received on Tue Mar 21 2017 - 16:55:40 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:10 UTC