Re: newnfs pkgng database corruption?

From: Baptiste Daroussin <bapt_at_FreeBSD.org>
Date: Fri, 12 Apr 2013 15:10:37 +0200
On Fri, Apr 12, 2013 at 12:56:10PM +0000, Eggert, Lars wrote:
> Hi,
> 
> On Apr 12, 2013, at 1:10, Rick Macklem <rmacklem_at_uoguelph.ca> wrote:
> > Well, I have no idea why an NFS server would reply errno 70 if the file
> > still exists, unless the client has somehow sent a bogus file handle
> > to the server. (I am not aware of any client bug that might do that. I
> > am almost suspicious that there might be a memory problem or something
> > that corrupts bits in the network layer. Do you have TSO enabled for your
> > network interface by any chance? If so, I'd try disabling that on the
> > network interface. Same goes for checksum offload.)
> > 
> > rick
> > ps: If you can capture packets between the client and server at the
> >    time this error occurs, looking at them in wireshark might be
> >    useful?
> 
> I will try all of those things.
> 
> But first, a question that someone who understands pkgng will be able to answerr: Is this "fake-pkg" process even running on the NFS mount? The WRKDIR is /tmp, which is an mfs mount.

fake-pkg is run in WRKDIR, but it calls pkgng which will open
/var/db/pkg/local.sqlite aka nfs mount.

The Error 70 is EX_SOFTWARE returned by pkgng.

Can you try the following patch:
http://people.freebsd.org/~bapt/patch-libpkg__pkgdb.c

Just add that file to /usr/ports/ports-mgmt/pkg/files/

If that works for you, that means the posix advisory locks is somehow failing on
nfsv4 files.

Given it is already known to be failing on nfsv3 (because people often
misconfigure it) I'll probablmy make unix-dotfile the default locking system
when local.sqlite is stored on network filesystem.

regards,
Bapt

Received on Fri Apr 12 2013 - 11:10:42 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:36 UTC