On Tue, Nov 13, 2018 at 3:51 PM Warner Losh <imp_at_bsdimp.com> wrote: > > > On Tue, Nov 13, 2018 at 3:10 PM Alan Somers <asomers_at_freebsd.org> wrote: > >> Hole-punching has been discussed on these lists before[1]. It basically >> means to turn a dense file into a sparse file by deallocating storage for >> some of the blocks in the middle. There's no standard API for it. Linux >> uses fallocate(2); Solaris and OSX add a new opcode to fcntl(2). >> >> A related concept is telling a block device that some blocks are no longer >> used. SATA calls this "TRIM", SCSI calls it "UNMAP", NVMe calls it >> "Deallocate", ZBC and ZAC call it "Reset Write Pointer". They all do >> basically the same thing, and it's analogous to hole-punching for regular >> files. They are also all inaccessible from FreeBSD's userland except by >> using pass(4), which is inconvenient and protocol-specific. >> >> Linux has a BLKDISCARD ioctl for issuing TRIM-like commands from userland, >> but it's totally undocumented and doesn't work on regular files. >> >> I propose adding support for all of these things using the fcntl(2) API. >> Using the same syntax that Solaris defined, you would be able to punch a >> hole in a regular file or TRIM blocks from an SSD. ZFS already supports >> it >> (though FreeBSD's port never did, and the code was deleted in r303763). >> Here's what I would do: >> >> 1) Add the F_FREESP command to fcntl(2). >> 2) Add a .fo_space field for struct fileops >> 3) Add a devfs_space method that implements .fo_space >> 4) Add a .d_space field to struct cdevsw >> 5) Add a g_dev_space method for GEOM that implements .d_space using >> BIO_DELETE. >> 6) Add a VOP_SPACE vop >> 7) Implement VOP_SPACE for tmpfs >> 8) Add aio_freesp(2), an asynchronous version of fcntl(F_FREESP). >> >> The greatest beneficiaries of this work would be type 2 hypervisors like >> QEMU and VirtualBox with guests that use TRIM, and userland filesystems >> such as fusefs-ext2 and fusefs-exfat. High-performance storage systems >> using SPDK would also benefit. The last item, aio_freesp(2), may seem >> unnecessary but it would really benefit my application. >> >> Questions, objections, flames? >> > > So the fcntl would deallocate blocks from a filesystem only. The > filesystem may issue BIO_DELETE as a result, but that's up to the > filesystem, correct? > Correct. > > On a raw device it would be translated into a BIO_DELETE command directly, > correct? > Correct, modulo edge cases. > > Warner >Received on Tue Nov 13 2018 - 21:52:59 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:19 UTC