Re: simplifying linux_emul_convpath()

From: Robert Watson <rwatson_at_freebsd.org> Date: Wed, 14 Jan 2004 11:55:58 -0500 (EST) · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:38 UTC

On Wed, 14 Jan 2004, Harti Brandt wrote:

> On Wed, 14 Jan 2004, Robert Watson wrote:
> 
> RW>On Wed, 14 Jan 2004, Don Lewis wrote:
> RW>
> RW>> I just stumbled across a vnode locking violation in
> RW>> linux_emul_convpath().  Rather than locking and unlocking each vnode for
> RW>> the VOP_GETATTR() calls, is there any reason that this code should not
> RW>> be simplified to just compare the vnode pointers rather than fetching
> RW>> the vnode attributes and comparing the attributes for equality.
> RW>
> RW>For some time, I've been thinking of adding samefile() and fsamefile()
> RW>system calls to FreeBSD, which would allow userspace applications to
> RW>determine if two names or file handles refer to the same object without
> RW>playing games with inode numbers, device ids, etc.  The reason to do this
> RW>would be that 32-bit inode numbers are subject to collision on large file
> RW>systems.  My initial implementation simply compared vnode pointers, but
> 
> This is a seriouse violation of Posix and may make applications like
> tar, mkisofs and friends do the wrong things. Are 32-bit inodes and UFS1
> restriction? 

That inode numbers are subject to collision is a practical reality with
the existence of globally scalable distributed file systems.  Many file
formats, APIs, and ABIs assume a 32-bit inode number; however, distributed
systems like AFS support hundreds of thousands, if not millions, of
concurrent users and computer systems.  Expecting each user/computer to
have not more than 4000 unique files (assuming you could manage storage
identification numbers that efficiently) would be (and is) bogus.  So the
reality is that the very large scalable file systems hash larger
identifiers into 32-bit values for local representation.  A further
reality is that those hashes collide, and that it's not easy (and probably
not practical) to try and keep track of the paperwork necessary to prevent
those collisions.  Because they're exposed to userspace, it's not feasible
to simply track the working set, you have to track the reuse of inode
numbers for a long time if you really want to provide those guarantees. 

ino_t probably does need to get bumped to 64-bit on FreeBSD at some point,
because at some point we will have a local file system that can usefully
represent more than 2 billion files.  I assume we didn't do the bump with
UFS2 because of the potential disruption for applications, etc.  It's
worth noting that Linux used to panic on an inode number collision, since
it used inode numbers internally in the VFS quite a bit.  These days, they
use a much more pointer-centric model, similar to the Sun/BSD VFS, where
objects are identified to VFS (and above VFS) using unique pointers for
the duration of a valid reference. 

Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
robert_at_fledge.watson.org      Senior Research Scientist, McAfee Research