Re: FreeBSD-HEAD gets stuck on vnode operations

From: Roger Pau Monné <roger.pau_at_citrix.com> Date: Mon, 27 May 2013 10:19:51 +0200 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:38 UTC

On 27/05/13 08:07, Konstantin Belousov wrote:
> On Mon, May 27, 2013 at 12:22:54AM +0200, Jilles Tjoelker wrote:
>> On Sun, May 26, 2013 at 10:52:07PM +0200, Roger Pau Monn? wrote:
>>> On 26/05/13 22:20, Jilles Tjoelker wrote:
>>>> Instead of a pause() that may be too short or too long, how about
>>>> waiting for the necessary lock? In other words, replace the kern_yield()
>>>> call with VI_LOCK(vp); VI_UNLOCK(vp);. This is also the usual approach
>>>> to acquire two locks without imposing an order between them.
>>
>>> Since there might be more than one locked vnode, waiting on a specific
>>> locked vnode seemed rather arbitrary, but I agree that the pause is also
>>> rather arbitrary.
>>
>>> Also, can we be sure that the v_interlock mutex will not be destroyed
>>> while the syncer process is waiting for it to be unlocked?
>>
>> I think this is a major problem. My idea was too easy and will not work.
>>
>> That said, the code in mnt_vnode_next_active() appears to implement some
>> sort of adaptive spinning for SMP. It tries VI_TRYLOCK for 200ms
>> (default value of hogticks) and then yields. This is far too long for a
>> mutex lock and if it takes that long it means that either the thread
>> owning the lock is blocked by us somehow or someone is abusing a mutex
>> to work like a sleepable lock such as by spinning or DELAY.
>>
>> Given that it has been spinning for 200ms, it is not so bad to pause for
>> one additional microsecond.
>>
>> The adaptive spinning was added fairly recently, so apparently it
>> happens fairly frequently that VI_TRYLOCK fails transiently.
>> Unfortunately, the real adaptive spinning code cannot be used because it
>> will spin forever as long as the thread owning v_interlock is running,
>> including when that is because it is spinning for vnode_free_list_mtx.
>> Perhaps we can try to VI_TRYLOCK a certain number of times. It is also
>> possible to check the contested bit of vnode_free_list_mtx
>> (sys/netgraph/netflow/netflow.c does something similar) and stop
>> spinning in that case.
>>
>> A cpu_spinwait() invocation should also be added to the spin loop.
> 
> There are two 'proper' solutions for this issue:
> 
> One is to change the handling of the vnode lifecycle to allow the
> safe block for the vnode interlock acquisition. In particular, the
> change would add some stability of the vnode memory when vnode is
> put on the free list. As example, the vnode zone could be marked as
> type-stable again, and then the vnode interlock can be obtained with
> dropped free list lock. Arguably, marking the zone as non-freeable would
> be a regression, esp. for the zone which accounts for largest allocation
> on the kernel memory.
> 
> Another one is to somehow ensure that the priority is properly
> propagated from the spinning thread to the vnode interlock owner.
> I think that it is desirable to donate some amount of priority
> from the spinning thread.  Unfortunately, I was unable to come
> up with elegant solution for this which would be also contained
> and did not require rewamp of the mutex interfaces.
> 
> BTW, if anybody come up with the idea of the restructuring the free list
> handling to avoid the free list/vnode interlock LOR altogether, it would
> be the best.
> 
> I do not have objections against the pause() addition, but I would
> argue that should_yield() should be removed then, switching the code to
> unconditionally pause when the collision detected.

Taking the idea from Jilles, what about replacing should_yield with a
check to see if the vnode_free_list_mtx mutex is contented?

That would prevent us from doing unnecessary pauses, and only releasing
the vnode_free_list_mtx mutex when there's someone else that actually
needs it.