I managed to find some time to take a closer look at vnode locking in the NFS server code and found that the situation was worse than I initially thought. I've put together a patch that seems to fix all the bugs that I found. With this patch, the code passes the simple tests that I wrote as well as NFS mounting a local directory on /usr/obj and running "make -j10 buildworld" (after I cranked up vfs.hirunningspace and vfs.lorunningspace by 50x to avoid the wdrain bio deadlock I mentioned yesterday), all with the DEBUG_VFS_LOCKS kernel option enabled. The NFS server code was in bad shape from being hacked on too many times before I touched it and it looks like it has accumulated some historical baggage, and my changes certainly don't help. I attempted to match the existing style and control flow since I wanted to minimize the changes at the time to avoid introducing new bugs, but this meant that I had to duplicate some code in a number of places. I saw two possible ways of getting the initial dirp attributes. One was to set LOCKPARENT on the first lookup() call in nfs_namei() and cap VOP_GETATTR() at that point. I chose the other possible implementation, which was to temporarily lock the dirp and call VOP_GETATTR() before the loop, because this change was simpler. The NFS server code badly needs a rewrite by someone who understands it well. I'm hoping to get enough review and testing of this patch so that I can get re approval to fix vnode locking in the NFS server code for 5.1.Received on Wed May 07 2003 - 10:36:34 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:06 UTC