Attilio Rao wrote: > 2008/8/23, John Baldwin <jhb_at_freebsd.org>: > >> On Friday 22 August 2008 01:33:28 pm kevinxlinuz wrote: >> >>> Hi, >>> I'm looking in the problem ( amd64/124200: kernel panic on mutex sleepq >>> chain).It troubles me for a long time.I add a KASSERT in sleepq_broadcast() >>> to check the sleepqueue's wait channel.At last it turn out that the >>> sleepqueue's wait channel was changed before sleepq_resume_thread(). In >>> sleepq_lookup(),We can easily find sq->sq_wchan == wchan.But after a short >>> time,the sq->sq_wchan nolonger equal with wchan,so I think it was changed >>> by other threads. >>> >> The sleepq chain lock is already held for all of sleepq_broadcast() by the >> caller (see wakeup() and cv_broadcastpri()). That said, I don't have any >> other good ideas for the panic you are seeing. Do you have a crash dump? It >> might be interesting to see what other thread is using that sleep queue. >> > > Ben Close and me investigated this bug extensively and still didn't > find the source. > Factors we have now: > 1) The lock, when accessing with DDB, is exactly locked by another > thread even if it should be held by the curthread. It is like the > mutex cookie gets overwritten by the other thread like if it was free. > An extra drop (and subsequent acquire) is not very likely because of > (2). > 2) KTR traces doesn't show anything wrong. Accesses to sleepqueue > chain lock are paired (both on via mtx_* interface and thread_lock > respectively). This is very strange because it excludes a wrong locks > semantic. > 3) The problem is reproducible even on 4BSD, without PREEMPTION and > even with smp sysctl disabled (it just brings more time). > 4) The bug seems triggered by sx + waitchannel when used in the > sx_sleep() and such. > > I'm thinking this can be some nasty, but sorta of deterministic, race > between sleepqueue accesses between the sx sleepqueue and the > waitchannel sleepqueue. > I have still to think better about it, but actually I'm pretty busy > and if you have good ideas please let me know. > The other common factor, though not 100% verified is everyone experiencing the race is running amd64. Cheers, BenjaminReceived on Thu Aug 28 2008 - 12:56:10 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:34 UTC