Re: process stuck in stat/../cache_lookup: ktorrent, zfs

From: Attilio Rao <attilio_at_freebsd.org>
Date: Sun, 6 Dec 2009 20:04:08 +0100
2009/12/6 Andriy Gapon <avg_at_icyb.net.ua>:
> on 06/12/2009 13:31 Andriy Gapon said the following:
>> System is recent 9-current, amd64.
>> I see that sometimes ktorrent gets stuck during heavy download (multiple files
>> in parallel, high speed).  It is completely unresponsive and not killable even
>> with SIGKILL.
> [snip]
>> #0  sched_switch (td=0xffffff012a6c5700, newtd=0xffffff0001533380,
>> flags=Variable "flags" is not available.
>> ) at /usr/src/sys/kern/sched_ule.c:1865
>> #1  0xffffffff80374baf in mi_switch (flags=260, newtd=0x0) at
>> /usr/src/sys/kern/kern_synch.c:449
>> #2  0xffffffff803a795b in sleepq_switch (wchan=Variable "wchan" is not available.
>> ) at /usr/src/sys/kern/subr_sleepqueue.c:509
>> #3  0xffffffff803a8645 in sleepq_wait (wchan=0xffffff0105b457f8, pri=80) at
>> /usr/src/sys/kern/subr_sleepqueue.c:588
>> #4  0xffffffff80351184 in __lockmgr_args (lk=0xffffff0105b457f8, flags=2097408,
>> ilk=0xffffff0105b45820, wmesg=Variable "wmesg" is not available.
>> ) at /usr/src/sys/kern/kern_lock.c:216
>
> So some more data:
> (kgdb) fr 4
>
> #4  0xffffffff80351184 in __lockmgr_args (lk=0xffffff0105b457f8, flags=2097408,
> ilk=0xffffff0105b45820, wmesg=Variable "wmesg" is not available.
> ) at /usr/src/sys/kern/kern_lock.c:216
> 216                     sleepq_wait(&lk->lock_object, pri);
> (kgdb) p *lk
> $8 = {lock_object = {lo_name = 0xffffffff80ad55b6 "zfs", lo_flags = 91947008,
> lo_data = 0, lo_witness = 0x0}, lk_lock = 3, lk_timo = 51, lk_pri = 80}
> (kgdb) p/x flags
> $9 = 0x200100
> (kgdb) p/x lk->lock_object.lo_flags
> $12 = 0x57b0000
>
> Apparently sleeplk is inlined into __lockmgr_args.
>
> So it looks like this is a LK_SHARED|LK_INTERLOCK lockmgr call which has not
> taken any easy path and ended up in sleepq_wait, but wakeup never comes for it,
> perhaps missed?

I think that a 'missed wakeup' is a too fast (and wrong) conclusion.
here the problem is that the lock is held in shared mode (lk->lk_lock
= 3) so you would need to know what happened to the owners once they
got the lock.
The only way you can do that, though, is with shared acquisitions,
then you should try to reproduce it with WITNESS on.
Once you have such datas we could digg further.

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
Received on Sun Dec 06 2009 - 18:04:10 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:58 UTC