Re: [BUG] I think sleepqueue need to be protected in sleepq_broadcast

From: kevin <kevinxlinuz_at_163.com>
Date: Sat, 23 Aug 2008 15:14:02 +0800 (CST)
>On Friday 22 August 2008 01:33:28 pm kevinxlinuz wrote:
>> Hi,
>>   I'm looking in the problem ( amd64/124200: kernel panic on mutex sleepq
>> chain).It troubles me for a long time.I add a KASSERT in sleepq_broadcast()
>> to check the sleepqueue's wait channel.At last it turn out that the
>> sleepqueue's wait channel was changed before sleepq_resume_thread(). In
>> sleepq_lookup(),We can easily find sq->sq_wchan == wchan.But after a short
>> time,the sq->sq_wchan nolonger equal with wchan,so I think it was changed
>> by other threads.
>
>The sleepq chain lock is already held for all of sleepq_broadcast() by the 
>caller (see wakeup() and cv_broadcastpri()).  That said, I don't have any 
>other good ideas for the panic you are seeing.  Do you have a crash dump?  It 
>might be interesting to see what other thread is using that sleep queue.
>
Sorry, panic does not work well for me.My system has 4G mem,but only 1.6G swap.When i want to get a coredump,it freeze at last.
I can easily reproduce the panic. 
This is some of my painc info.Without the KASSERT in sleepq_broadcast(), it panic on sleepq_resume_thread().
db>show thread 100069 
Thread 100069 at 0xffffff0004c73000:
proc (pid 153):0xffffff0004c6a860
name: txg_thread_enter
stack: 0xfffffffea603c000-0xfffffffea603efff
flags:0x4 pflags:0x200000
state:RUNNING (CPU 1)
priority:120
contaniner lock:sched lock 1(0xffffffff809a7300)
db>show lock 0xffffffff809a7300
class:spin mutex
name:sched lock 0
flags: {SPIN,RECURSE}
state:{UNOWNED}

db>show thread 100082 (thread on another cpu)
Thread 100082 at 0xffffff0004c76700:
proc (pid 152):0xffffff0004c89430
name:txg_thread_enter
stack:0xfffffffea5f9c000-0xfffffffea5f9ffff
flags:0x4 pflags:0x200000
state:RUNNING (CPU 0)
wmesg:tx-tx_sync_lock wchan:0xffffff0004e095b8
priority: 160
container lock:sched lock 0 (0xffffffff809a6700)
db>show lock 0xffffff0004e095b8
class: sx
name:tx-tx_sync_lock
state:XLOCK:0xffffff0004c73000(tid 100069,pid 153,"txg_thread_enter")
waiters:exclusive
db>bt 100069
Tracing pid 153 tid 100069 td 0xffffff0004c73000
kdb_enter() at kdb_enter=0x3d
panic() at panic+0x16c
assert_mtx() at assert_mtx
sleepq_resume_thread() at sleepq_resume_thread+0x96
sleepq_broadcast() at sleepq_broadcast+0x85
cv_broadcastpri() at cv_broadcastpri+0x3f
txg_sync_thread() at txg_sync_thread+0x4b4
fork_exit() aat fork_exit+0x12a
fork_trampoline() at fork_trampoline+0xe
--- trap 0,rip=0,rsp =0xfffffffea603ed30,rbp=0 ---
db>bt 100082
..
_thread_lock_flags() at _thread_lock_flags+0xc9
sleepq_wait() at sleepq_wait+0x3b
_sx_xlock_hard() at _sx_xlock_hard+0x1a2
_sx_xlock() at _sx_xlock+0xa0
_cv_wait() at _cv_wait+0x1de
txg_thread_wait() at txg_thread_wait+0x7d
txg_quiesce_thread() at txg_quiesce_thread+0xb5
fork_exit() at fork_exit+0x12a
fork_trampoline() at fork_trampoline+0xe
--- trap 0,rip=0,rsp = 0xfffffffea5f9fd30,rbp=0 ---

If i increase the swap size to 4G,will the coredump work correctly?
>>  sleepq_broadcast(void *wchan, int flags, int pri, int queue)
>> {
>>         struct sleepqueue *sq;
>>         struct thread *td;
>>         int wakeup_swapper;
>>
>>         CTR2(KTR_PROC, "sleepq_broadcast(%p, %d)", wchan, flags);
>>         KASSERT(wchan != NULL, ("%s: invalid NULL wait channel",
>> __func__)); MPASS((queue >= 0) && (queue < NR_SLEEPQS));
>>         sq = sleepq_lookup(wchan);
>>         if (sq == NULL)
>>                 return (0);
>>         KASSERT(sq->sq_type == (flags & SLEEPQ_TYPE),
>>             ("%s: mismatch between sleep/wakeup and cv_*", __func__));
>>
>>         /* Resume all blocked threads on the sleep queue. */
>>         wakeup_swapper = 0;
>>         while (!TAILQ_EMPTY(&sq->sq_blocked[queue])) {
>>                 td = TAILQ_FIRST(&sq->sq_blocked[queue]);
>>                 thread_lock(td);
>>         /*       test     */
>>                 KASSERT(sq->sq_wchan == wchan,
>>                       ("%s:mismatch between wchan and sq_wchan in
>> sq",__func__)); /* I find the panic here */
>>                 if (sleepq_resume_thread(sq, td, pri))
>>                         wakeup_swapper = 1;
>>                 thread_unlock(td);
>>         }
>>         return (wakeup_swapper);
>> }
>>
>> Thanks,
>> kevin  2008/08/23
>>
>> _______________________________________________
>> freebsd-current_at_freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-current
>> To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"
>
>
>
>-- 
>John Baldwin
>_______________________________________________
>freebsd-current_at_freebsd.org mailing list
>http://lists.freebsd.org/mailman/listinfo/freebsd-current
>To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"
--
Thanks,
kevin
Received on Sat Aug 23 2008 - 05:14:04 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:34 UTC