Re: Page fault in uipc_usrreq.c:997

From: Peter Holm <peter_at_holm.cc>
Date: Sat, 9 Sep 2006 17:25:45 +0200
On Sat, Sep 09, 2006 at 01:33:33PM +0100, Robert Watson wrote:
> 
> On Fri, 8 Sep 2006, Peter Holm wrote:
> 
> >During boot of GENERIC HEAD from Sep 7 07:29 UTC I got this page
> >fault:
> >
> >Kernel page fault with the following non-sleepable locks held:
> >exclusive sleep mutex unp r = 0 (0xc0a5520c) locked _at_
> >kern/uipc_usrreq.c:987
> >KDB: stack backtrace:
> >kdb_backtrace(1,c410b000,c,c3f77a20,e43f7a28,...) at
> >kdb_backtrace+0x29
> >witness_warn(5,0,c0941302) at witness_warn+0x192
> >trap(8,28,c4190028,c413a7a8,c4195690,...) at trap+0x108
> >calltrap() at calltrap+0x5
> >--- trap 0xc, eip = 0xc06e01e6, esp = 0xe43f7a70, ebp = 0xe43f7bfc ---
> >unp_connect(c41ce000,c3f797e0,c3f77a20,c0a5520c,0,...) at
> >unp_connect+0x292
> >uipc_connect(c41ce000,c3f797e0,c3f77a20) at uipc_connect+0x3e
> >soconnect(c41ce000,c3f797e0,c3f77a20) at soconnect+0x4e
> >kern_connect(c3f77a20,3,c3f797e0,c3f797e0,0,...) at kern_connect+0x76
> >connect(c3f77a20,e43f7d04) at connect+0x30
> >syscall(3b,3b,3b,1,8270000,...) at syscall+0x256
> >
> >http://people.freebsd.org/~pho/stress/log/cons207.html.
> >
> >The core file is toast and I missed a back trace of pid 678 :-(
> 
> This is likely one of the remaining race conditions in UNIX domain sockets 
> having to do with simultaneous connect and close, which occur due to 
> dropping locks for either a blocking name lookup or a recursion via the 
> socket layer into the protocol a second time.  When the UNIX domain socket 
> global lock is dropped and re-acquired, the UNIX domain socket code needs 
> to re-evaluate its assumptions regarding any references it has to other 
> UNIX domain sockets, which may have "gone away" while the lock was 
> released.  Interestingly, many of these races also existed in 4.x and 
> before, but they are more exposed with greater kernel parallelism.  I 
> recently closed a spate of them, but it looks like a few remain.  In this 
> case, the listen socket has possibly been closed (although possibly not) 
> while sonewconn() is called.  It could be a reference needs to be added to 
> so2 before dropping the unp lock.  I saw John's follow-up, but if ups/he 
> don't have a fixed in a few days once I get back to the UK, I can 
> investigate.  Send me a ping next week if I appear to forget :-).
> 

OK. I'll keep this panic on my list until it's fixed.

- Peter

> Robert N M Watson
> Computer Laboratory
> University of Cambridge
Received on Sat Sep 09 2006 - 13:25:50 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:00 UTC