On 27/05/13 12:23, Konstantin Belousov wrote: > On Mon, May 27, 2013 at 10:19:51AM +0200, Roger Pau Monn? wrote: >> On 27/05/13 08:07, Konstantin Belousov wrote: >>> On Mon, May 27, 2013 at 12:22:54AM +0200, Jilles Tjoelker wrote: >>>> On Sun, May 26, 2013 at 10:52:07PM +0200, Roger Pau Monn? wrote: >>>>> On 26/05/13 22:20, Jilles Tjoelker wrote: >>>>>> Instead of a pause() that may be too short or too long, how about >>>>>> waiting for the necessary lock? In other words, replace the kern_yield() >>>>>> call with VI_LOCK(vp); VI_UNLOCK(vp);. This is also the usual approach >>>>>> to acquire two locks without imposing an order between them. >>>> >>>>> Since there might be more than one locked vnode, waiting on a specific >>>>> locked vnode seemed rather arbitrary, but I agree that the pause is also >>>>> rather arbitrary. >>>> >>>>> Also, can we be sure that the v_interlock mutex will not be destroyed >>>>> while the syncer process is waiting for it to be unlocked? >>>> >>>> I think this is a major problem. My idea was too easy and will not work. >>>> >>>> That said, the code in mnt_vnode_next_active() appears to implement some >>>> sort of adaptive spinning for SMP. It tries VI_TRYLOCK for 200ms >>>> (default value of hogticks) and then yields. This is far too long for a >>>> mutex lock and if it takes that long it means that either the thread >>>> owning the lock is blocked by us somehow or someone is abusing a mutex >>>> to work like a sleepable lock such as by spinning or DELAY. >>>> >>>> Given that it has been spinning for 200ms, it is not so bad to pause for >>>> one additional microsecond. >>>> >>>> The adaptive spinning was added fairly recently, so apparently it >>>> happens fairly frequently that VI_TRYLOCK fails transiently. >>>> Unfortunately, the real adaptive spinning code cannot be used because it >>>> will spin forever as long as the thread owning v_interlock is running, >>>> including when that is because it is spinning for vnode_free_list_mtx. >>>> Perhaps we can try to VI_TRYLOCK a certain number of times. It is also >>>> possible to check the contested bit of vnode_free_list_mtx >>>> (sys/netgraph/netflow/netflow.c does something similar) and stop >>>> spinning in that case. >>>> >>>> A cpu_spinwait() invocation should also be added to the spin loop. >>> >>> There are two 'proper' solutions for this issue: >>> >>> One is to change the handling of the vnode lifecycle to allow the >>> safe block for the vnode interlock acquisition. In particular, the >>> change would add some stability of the vnode memory when vnode is >>> put on the free list. As example, the vnode zone could be marked as >>> type-stable again, and then the vnode interlock can be obtained with >>> dropped free list lock. Arguably, marking the zone as non-freeable would >>> be a regression, esp. for the zone which accounts for largest allocation >>> on the kernel memory. >>> >>> Another one is to somehow ensure that the priority is properly >>> propagated from the spinning thread to the vnode interlock owner. >>> I think that it is desirable to donate some amount of priority >>> from the spinning thread. Unfortunately, I was unable to come >>> up with elegant solution for this which would be also contained >>> and did not require rewamp of the mutex interfaces. >>> >>> BTW, if anybody come up with the idea of the restructuring the free list >>> handling to avoid the free list/vnode interlock LOR altogether, it would >>> be the best. >>> >>> I do not have objections against the pause() addition, but I would >>> argue that should_yield() should be removed then, switching the code to >>> unconditionally pause when the collision detected. >> >> Taking the idea from Jilles, what about replacing should_yield with a >> check to see if the vnode_free_list_mtx mutex is contented? >> >> That would prevent us from doing unnecessary pauses, and only releasing >> the vnode_free_list_mtx mutex when there's someone else that actually >> needs it. > This would still be racy, and possibly allows the lock convoy in the same > manner as the current code. Also, AFAIR, the real problem was when > two iterators start synchronized, it usually ended in live-lock. > > If you are willing, try this, of course, but I tend to agree with just > an addition of pause() for now. OK, I've tested replacing kern_yield with a pause the whole night and that seems to be working as expected, I did no longer see any lockups, usually during a whole night run I saw at least 3 or 4 lockups. If you are happy (and others) with the replacement you can commit it and we can replace the should_yield call later if needed. Thanks, Roger.Received on Mon May 27 2013 - 08:35:22 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:38 UTC