Re: should a copy_file_range(2) syscall be interrupted via a signal

From: Konstantin Belousov <kostikbel_at_gmail.com>
Date: Fri, 5 Jul 2019 20:48:48 +0300
On Fri, Jul 05, 2019 at 07:30:54PM +0200, Jilles Tjoelker wrote:
> On Fri, Jul 05, 2019 at 12:28:51AM +0000, Rick Macklem wrote:
> > I have been working on a Linux compatible copy_file_range(2) syscall
> > (the current code can be found at https://reviews.freebsd.org/D20584).
> 
> > One outstanding issue is how it should deal with signals. Right now, I
> > have vn_start_write() without PCATCH, so that it won't be interrupted
> > by a signal, but I notice that vn_write() {ie. write syscall } does
> > have PCATCH on vn_start_write() and so does vn_rdwr() when it is
> > called without IO_NODELOCKED.
> 
> A regular write() is only interruptible when writing to a terminal,
> pseudo-terminal master, pipe, socket, or, under certain conditions, a
> file on an NFS intr mount. Therefore, applications may not have the code
> to resume interrupted writes to regular files gracefully.
> 
> > I am thinking that copy_file_range(2) should do this also.
> > However, if it returns an error, it is impossible for the caller to
> > know how much of the data range got copied.
> 
> A regular write() returns partial success if interrupted by a signal
> when it has already written something. Therefore, the application can
> resume the operation by adjusting pointers and counts.
> 
> Something similar applies to "deterministic" errors like [EFBIG] where
> the first call will write as far as possible (if this is not nothing)
> successfully and the next attempt will return the error.
> 
> > What do you think the copy_file_range(2) code should do?
> 
> I'm not sure it should actually be done, but the need for adjusting
> pointers and counts could be avoided with a little extra kernel and libc
> code. The system call would receive an additional argument pointing to
> an off_t that indicates how many bytes previous calls have already
> written. A libc wrapper would initialize this to 0. With this, the
> system call can be restarted automatically after a signal.
> 
> In any case, [EINTR] and the internal ERESTART must not be returned
> unless it is safe to repeat the call with the same (direct) arguments.

BTW, if the syscall is made interruptible, it should be made cancellable ?

I think that PCATCH commonly used for vn_start_write(9) is not the best
decision.  It is safe in the sense explained by Jilles, since its interruption
only happens at the very beginning of the syscall, but it contradict to the
tradition of write(2) to the local fs being not interruptible.

I suggest to not make the syscall interruptible by default, and perhaps
only allow it with a flag.  Then you would need to explain that the
syscall is only interruptible between VOPs, it is up to fs to decide if
the VOP_READ/VOP_WRITE is interruptible (e.g. devfs and nfs).
Received on Fri Jul 05 2019 - 15:48:58 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:21 UTC