Re: [BUG] I think sleepqueue need to be protected in sleepq_broadcast

From: Benjamin Close <Benjamin.Close_at_clearchain.com>
Date: Fri, 29 Aug 2008 00:26:01 +0930
Attilio Rao wrote:
> 2008/8/23, John Baldwin <jhb_at_freebsd.org>:
>   
>> On Friday 22 August 2008 01:33:28 pm kevinxlinuz wrote:
>>     
>>> Hi,
>>>   I'm looking in the problem ( amd64/124200: kernel panic on mutex sleepq
>>> chain).It troubles me for a long time.I add a KASSERT in sleepq_broadcast()
>>> to check the sleepqueue's wait channel.At last it turn out that the
>>> sleepqueue's wait channel was changed before sleepq_resume_thread(). In
>>> sleepq_lookup(),We can easily find sq->sq_wchan == wchan.But after a short
>>> time,the sq->sq_wchan nolonger equal with wchan,so I think it was changed
>>> by other threads.
>>>       
>> The sleepq chain lock is already held for all of sleepq_broadcast() by the
>> caller (see wakeup() and cv_broadcastpri()).  That said, I don't have any
>> other good ideas for the panic you are seeing.  Do you have a crash dump?  It
>> might be interesting to see what other thread is using that sleep queue.
>>     
>
> Ben Close and me investigated this bug extensively and still didn't
> find the source.
> Factors we have now:
> 1) The lock, when accessing with DDB, is exactly locked by another
> thread even if it should be held by the curthread. It is like the
> mutex cookie gets overwritten by the other thread like if it was free.
> An extra drop (and subsequent acquire) is not very likely because of
> (2).
> 2) KTR traces doesn't show anything wrong. Accesses to sleepqueue
> chain lock are paired (both on via mtx_* interface and thread_lock
> respectively). This is very strange because it excludes a wrong locks
> semantic.
> 3) The problem is reproducible even on 4BSD, without PREEMPTION and
> even with smp sysctl disabled (it just brings more time).
> 4) The bug seems triggered by sx + waitchannel when used in the
> sx_sleep() and such.
>
> I'm thinking this can be some nasty, but sorta of deterministic, race
> between sleepqueue accesses between the sx sleepqueue and the
> waitchannel sleepqueue.
> I have still to think better about it, but actually I'm pretty busy
> and if you have good ideas please let me know.
>   
The other common factor, though not 100% verified is everyone 
experiencing the race is running amd64.

Cheers,
    Benjamin
Received on Thu Aug 28 2008 - 12:56:10 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:34 UTC