On Wed, Jun 08, 2016 at 11:17:44PM +0200, Jilles Tjoelker wrote: > On Wed, Jun 08, 2016 at 04:56:35PM +0300, Konstantin Belousov wrote: > > On Wed, Jun 08, 2016 at 06:35:08AM -0700, Mark Johnston wrote: > > > On Wed, Jun 08, 2016 at 07:30:55AM +0300, Konstantin Belousov wrote: > > > > On Tue, Jun 07, 2016 at 11:19:19PM +0200, Jilles Tjoelker wrote: > > > > > I also wonder whether we may be overengineering things here. Perhaps > > > > > the advlock sleep can simply turn off TDF_SBDRY. > > > > Well, this was the very first patch suggested. I would be fine with that, > > > > but again, out-of-tree code seems to be not quite fine with that local > > > > solution. > > > > In our particular case, we could possibly use a similar approach. In > > > general, it seems incorrect to clear TDF_SBDRY if the thread calling > > > sx_sleep() has any locks held. It is easy to verify that all callers of > > > lf_advlock() are safe in this respect, but this kind of auditing is > > > generally hard. In fact, I believe the sx_sleep that led to the problem > > > described in D2612 is the same as the one in my case. That is, the > > > sleeping thread may or may not hold a vnode lock depending on context. > > > I do not think that in-tree code sleeps with a vnode lock held in > > the lf_advlock(). Otherwise, system would hang in lock cascade by > > an attempt to obtain an advisory lock. I think we can even assert > > this with witness. > > > There is another sleep, which Jilles mentioned, in lf_purgelocks(), > > called from vgone(). This sleep indeed occurs under the vnode lock, and > > as such must be non-suspendable. The sleep waits until other threads > > leave the lf_advlock() for the reclaimed vnode, and they should leave in > > deterministic time due to issued wakeups. So this sleep is exempt from > > the considerations, and TDF_SBDRY there is correct. > > > I am fine with either the braces around sx_sleep() in lf_advlock() to > > clear TDF_SBDRY (sigdeferstsop()), or with the latest patch I sent, > > which adds temporal override for TDF_SBDRY with TDF_SRESTART. My > > understanding is that you prefer the later. If I do not mis-represent > > your position, I understand why you do prefer that. > > The TDF_SRESTART change does fix some more problems such as umount -f > getting stuck in lf_purgelocks(). > > However, it introduces some subtle issues that may not necessarily be a > sufficient objection. I did not see an objection in the notes below, but rather I read them as an argument to return EINTR from the stop attempts from lf_advlock(). > > Firstly, adding this closes the door on fixing signal handling for > fcntl(F_SETLKW). Per POSIX, any caught signal interrupts > fcntl(F_SETLKW), even if SA_RESTART is set for the signal, and the Linux > man page documents the same. Our man page has documented that SA_RESTART > behaves normally with fcntl(F_SETLKW) since at least FreeBSD 2.0. This > could normally be fixed via if (error == ERESTART) error = EINTR; but > that is no longer possible if there are [ERESTART] errors that should > still restart. I added the translation to fcntl handler. > > Secondly, fcntl(F_SETLKW) restarting after a stop may actually be > observable, contrary to what I wrote before. This is due to the fair > queuing. Suppose thread A has locked byte 1 a while ago and thread B is > trying to lock byte 1 and 2 right now. Then thread C will be able to > lock byte 2 iff thread B has not blocked yet. If thread C will not be > allowed to lock byte 2 and will block on it, the TDF_SRESTART change > will cause it to be awakened if thread B is stopped. When thread B > resumes, the region to be locked will be recomputed. This scenario > unambiguously violates the POSIX requirement but I don't know how bad it > is. Indeed. > > Note that all these threads must be in separate processes because of > fcntl locks' strange semantics. So below is the next level of over-engineering the stuff. I made it unified on sigdeferstop() to avoid blowing up the KPI. I am not sure what modes are needed by onefs, so I added SIGDEFERSTOP_ERESTART despite it is not used by the kernel. lf_advlock() is not stoppable at all, with EINTR return. In the previous patches, only if the caller of lf_advlock() already set TDF_SBDRY, the function modified the behaviour. I considered adding td_sbdry member to struct thread for managing sbdry stops, but did not for now. If one more flag appear to be used, I will change that. diff --git a/sys/fs/fifofs/fifo_vnops.c b/sys/fs/fifofs/fifo_vnops.c index 716faa3..fb3eb90 100644 --- a/sys/fs/fifofs/fifo_vnops.c +++ b/sys/fs/fifofs/fifo_vnops.c _at__at_ -194,11 +194,10 _at__at_ fifo_open(ap) if ((ap->a_mode & FREAD) && fip->fi_writers == 0) { gen = fip->fi_wgen; VOP_UNLOCK(vp, 0); - stops_deferred = sigallowstop(); + stops_deferred = sigdeferstop(SIGDEFERSTOP_OFF); error = msleep(&fip->fi_readers, PIPE_MTX(fpipe), PDROP | PCATCH | PSOCK, "fifoor", 0); - if (stops_deferred) - sigdeferstop(); + sigallowstop(stops_deferred); vn_lock(vp, LK_EXCLUSIVE | LK_RETRY); if (error != 0 && gen == fip->fi_wgen) { fip->fi_readers--; _at__at_ -222,11 +221,10 _at__at_ fifo_open(ap) if ((ap->a_mode & FWRITE) && fip->fi_readers == 0) { gen = fip->fi_rgen; VOP_UNLOCK(vp, 0); - stops_deferred = sigallowstop(); + stops_deferred = sigdeferstop(SIGDEFERSTOP_OFF); error = msleep(&fip->fi_writers, PIPE_MTX(fpipe), PDROP | PCATCH | PSOCK, "fifoow", 0); - if (stops_deferred) - sigdeferstop(); + sigallowstop(stops_deferred); vn_lock(vp, LK_EXCLUSIVE | LK_RETRY); if (error != 0 && gen == fip->fi_rgen) { fip->fi_writers--; diff --git a/sys/kern/kern_descrip.c b/sys/kern/kern_descrip.c index 7f0ef8d..58a37f9 100644 --- a/sys/kern/kern_descrip.c +++ b/sys/kern/kern_descrip.c _at__at_ -648,6 +648,16 _at__at_ kern_fcntl(struct thread *td, int fd, int cmd, intptr_t arg) PROC_UNLOCK(p->p_leader); error = VOP_ADVLOCK(vp, (caddr_t)p->p_leader, F_SETLK, flp, flg); + /* + * Do not automatically restart the lock + * acquire, for two reasons. First, POSIX + * requires signal delivery to return EINTR. + * Second, fairness of the lock acquision, + * ensured by queueing in lf_advlock(), would + * be defeated by the retry. + */ + if (cmd == F_SETLKW && error == ERESTART) + error = EINTR; break; case F_WRLCK: if ((fp->f_flag & FWRITE) == 0) { _at__at_ -659,6 +669,8 _at__at_ kern_fcntl(struct thread *td, int fd, int cmd, intptr_t arg) PROC_UNLOCK(p->p_leader); error = VOP_ADVLOCK(vp, (caddr_t)p->p_leader, F_SETLK, flp, flg); + if (cmd == F_SETLKW && error == ERESTART) + error = EINTR; break; case F_UNLCK: error = VOP_ADVLOCK(vp, (caddr_t)p->p_leader, F_UNLCK, diff --git a/sys/kern/kern_lockf.c b/sys/kern/kern_lockf.c index a0a3789..970a38d 100644 --- a/sys/kern/kern_lockf.c +++ b/sys/kern/kern_lockf.c _at__at_ -1378,7 +1378,7 _at__at_ lf_setlock(struct lockf *state, struct lockf_entry *lock, struct vnode *vp, void **cookiep) { static char lockstr[] = "lockf"; - int priority, error; + int error, priority, stops_deferred; #ifdef LOCKF_DEBUG if (lockf_debug & 1) _at__at_ -1466,7 +1466,9 _at__at_ lf_setlock(struct lockf *state, struct lockf_entry *lock, struct vnode *vp, } lock->lf_refs++; + stops_deferred = sigdeferstop(SIGDEFERSTOP_EINTR); error = sx_sleep(lock, &state->ls_lock, priority, lockstr, 0); + sigallowstop(stops_deferred); if (lf_free_lock(lock)) { error = EDOOFUS; goto out; diff --git a/sys/kern/kern_sig.c b/sys/kern/kern_sig.c index 75a1259..ee5d253 100644 --- a/sys/kern/kern_sig.c +++ b/sys/kern/kern_sig.c _at__at_ -2596,41 +2596,81 _at__at_ tdsigcleanup(struct thread *td) } +static int +sigdeferstop_curr_flags(int cflags) +{ + + MPASS((cflags & (TDF_SEINTR | TDF_SERESTART)) == 0 || + (cflags & TDF_SBDRY) != 0); + return (cflags & (TDF_SBDRY | TDF_SEINTR | TDF_SERESTART)); +} + /* - * Defer the delivery of SIGSTOP for the current thread. Returns true - * if stops were deferred and false if they were already deferred. + * Defer the delivery of SIGSTOP for the current thread, according to + * the requested mode. Returns previous flags, which must be restored + * by sigallowstop(). + * + * TDF_SBDRY, TDF_SEINTR, and TDF_SERESTART flags are only set and + * cleared by the current thread, which allow the lock-less read-only + * accesses below. */ int -sigdeferstop(void) +sigdeferstop(int mode) { struct thread *td; + int cflags, nflags; td = curthread; - if (td->td_flags & TDF_SBDRY) - return (0); - thread_lock(td); - td->td_flags |= TDF_SBDRY; - thread_unlock(td); - return (1); + cflags = sigdeferstop_curr_flags(td->td_flags); + switch (mode) { + case SIGDEFERSTOP_NOP: + nflags = cflags; + break; + case SIGDEFERSTOP_OFF: + nflags = 0; + break; + case SIGDEFERSTOP_SILENT: + nflags = (cflags | TDF_SBDRY) & ~(TDF_SEINTR | TDF_SERESTART); + break; + case SIGDEFERSTOP_EINTR: + nflags = (cflags | TDF_SBDRY | TDF_SEINTR) & ~TDF_SERESTART; + break; + case SIGDEFERSTOP_ERESTART: + nflags = (cflags | TDF_SBDRY | TDF_SERESTART) & ~TDF_SEINTR; + break; + default: + panic("sigdeferstop: invalid mode %x", mode); + break; + } + if (cflags != nflags) { + thread_lock(td); + td->td_flags = (td->td_flags & ~cflags) | nflags; + thread_unlock(td); + } + return (cflags); } /* - * Permit the delivery of SIGSTOP for the current thread. This does - * not immediately suspend if a stop was posted. Instead, the thread - * will suspend either via ast() or a subsequent interruptible sleep. + * Restores the STOP handling mode, typically permitting the delivery + * of SIGSTOP for the current thread. This does not immediately + * suspend if a stop was posted. Instead, the thread will suspend + * either via ast() or a subsequent interruptible sleep. */ -int -sigallowstop(void) +void +sigallowstop(int prev) { struct thread *td; - int prev; + int cflags; + KASSERT((prev & ~(TDF_SBDRY | TDF_SEINTR | TDF_SERESTART)) == 0, + ("sigallowstop: incorrect previous mode %x", prev)); td = curthread; - thread_lock(td); - prev = (td->td_flags & TDF_SBDRY) != 0; - td->td_flags &= ~TDF_SBDRY; - thread_unlock(td); - return (prev); + cflags = sigdeferstop_curr_flags(td->td_flags); + if (cflags != prev) { + thread_lock(td); + td->td_flags = (td->td_flags & ~cflags) | prev; + thread_unlock(td); + } } /* diff --git a/sys/kern/kern_thread.c b/sys/kern/kern_thread.c index 9af377e..c85b153 100644 --- a/sys/kern/kern_thread.c +++ b/sys/kern/kern_thread.c _at__at_ -899,7 +899,7 _at__at_ thread_suspend_check(int return_instead) { struct thread *td; struct proc *p; - int wakeup_swapper; + int wakeup_swapper, r; td = curthread; p = td->td_proc; _at__at_ -932,7 +932,21 _at__at_ thread_suspend_check(int return_instead) if ((td->td_flags & TDF_SBDRY) != 0) { KASSERT(return_instead, ("TDF_SBDRY set for unsafe thread_suspend_check")); - return (0); + switch (td->td_flags & (TDF_SEINTR | TDF_SERESTART)) { + case 0: + r = 0; + break; + case TDF_SEINTR: + r = EINTR; + break; + case TDF_SERESTART: + r = ERESTART; + break; + default: + panic("both TDF_SEINTR and TDF_SERESTART"); + break; + } + return (r); } /* diff --git a/sys/kern/subr_trap.c b/sys/kern/subr_trap.c index 6d1ac70..eb44087 100644 --- a/sys/kern/subr_trap.c +++ b/sys/kern/subr_trap.c _at__at_ -160,7 +160,7 _at__at_ userret(struct thread *td, struct trapframe *frame) ("userret: Returning with with pinned thread")); KASSERT(td->td_vp_reserv == 0, ("userret: Returning while holding vnode reservation")); - KASSERT((td->td_flags & TDF_SBDRY) == 0, + KASSERT((td->td_flags & (TDF_SBDRY | TDF_SEINTR | TDF_SERESTART)) == 0, ("userret: Returning with stop signals deferred")); KASSERT(td->td_su == NULL, ("userret: Returning with SU cleanup request not handled")); diff --git a/sys/sys/mount.h b/sys/sys/mount.h index f11f8f5..5438140 100644 --- a/sys/sys/mount.h +++ b/sys/sys/mount.h _at__at_ -653,15 +653,15 _at__at_ vfs_statfs_t __vfs_statfs; #define VFS_PROLOGUE(MP) do { \ struct mount *mp__; \ - int _enable_stops; \ + int _prev_stops; \ \ mp__ = (MP); \ - _enable_stops = (mp__ != NULL && \ - (mp__->mnt_vfc->vfc_flags & VFCF_SBDRY) && sigdeferstop()) + _prev_stops = sigdeferstop((mp__ != NULL && \ + (mp__->mnt_vfc->vfc_flags & VFCF_SBDRY) != 0) ? \ + SIGDEFERSTOP_SILENT : SIGDEFERSTOP_NOP); #define VFS_EPILOGUE(MP) \ - if (_enable_stops) \ - sigallowstop(); \ + sigallowstop(_prev_stops); \ } while (0) #define VFS_MOUNT(MP) ({ \ diff --git a/sys/sys/proc.h b/sys/sys/proc.h index 629f7e8..1619d34 100644 --- a/sys/sys/proc.h +++ b/sys/sys/proc.h _at__at_ -395,9 +395,9 _at__at_ do { \ #define TDF_NEEDRESCHED 0x00010000 /* Thread needs to yield. */ #define TDF_NEEDSIGCHK 0x00020000 /* Thread may need signal delivery. */ #define TDF_NOLOAD 0x00040000 /* Ignore during load avg calculations. */ -#define TDF_UNUSED19 0x00080000 /* --available-- */ +#define TDF_SERESTART 0x00080000 /* ERESTART on stop attempts. */ #define TDF_THRWAKEUP 0x00100000 /* Libthr thread must not suspend itself. */ -#define TDF_UNUSED21 0x00200000 /* --available-- */ +#define TDF_SEINTR 0x00200000 /* EINTR on stop attempts. */ #define TDF_SWAPINREQ 0x00400000 /* Swapin request due to wakeup. */ #define TDF_UNUSED23 0x00800000 /* --available-- */ #define TDF_SCHED0 0x01000000 /* Reserved for scheduler private use */ diff --git a/sys/sys/signalvar.h b/sys/sys/signalvar.h index e574ec3..176bb2a 100644 --- a/sys/sys/signalvar.h +++ b/sys/sys/signalvar.h _at__at_ -325,9 +325,21 _at__at_ extern struct mtx sigio_lock; #define SIGPROCMASK_PROC_LOCKED 0x0002 #define SIGPROCMASK_PS_LOCKED 0x0004 +/* + * Modes for sigdeferstop(). Manages behaviour of + * thread_suspend_check() in the region delimited by + * sigdeferstop()/sigallowstop(). Must be restored to + * SIGDEFERSTOP_OFF before returning to userspace. + */ +#define SIGDEFERSTOP_NOP 0 /* continue doing whatever is done now */ +#define SIGDEFERSTOP_OFF 1 /* stop ignoring STOPs */ +#define SIGDEFERSTOP_SILENT 2 /* silently ignore STOPs */ +#define SIGDEFERSTOP_EINTR 3 /* ignore STOPs, return EINTR */ +#define SIGDEFERSTOP_ERESTART 4 /* ignore STOPs, return ERESTART */ + int cursig(struct thread *td); -int sigdeferstop(void); -int sigallowstop(void); +int sigdeferstop(int mode); +void sigallowstop(int prev); void execsigs(struct proc *p); void gsignal(int pgid, int sig, ksiginfo_t *ksi); void killproc(struct proc *p, char *why);Received on Thu Jun 09 2016 - 02:35:05 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:05 UTC