Re: Kernel page fault with non-sleepable locks held error with kernel r270837

From: 小野寛生 <hiroo.ono+freebsd_at_gmail.com>
Date: Sun, 31 Aug 2014 20:43:54 +0900
Thank you for taking a look into this.

2014-08-31 15:47 GMT+09:00 John-Mark Gurney <jmg_at_funkthat.com>:
> Hiroo Ono (????????????) wrote this message on Sun, Aug 31, 2014 at 14:01 +0900:
>> During upgrading world and kernel from r26939 to r270837, I got the
>> following problem.
>> a) the arch is i386
>> b) kernel is of r270837, userland is of r26939 (make kernel is done
>> and rebooted, make installworld not yet).
>> c) booting in single user mode is OK.
>> d) during startup of multi-user mode, when dhclient is run, the
>> following message appears, and the system freezes:
>>
>> Starting devd.
>> wlan0: link state changed to UP
>> Starting webcamd.
>> Attached to ugen4.2[0]
>> Starting webcready running for ugen4.2.0
>> /usr/local/etc/rc.d/webcamd: WARNING: failed to start webcamd
>> Starting dhclient.
>> DHCPREQUEST on wlan0 to 255.255.255.255 port 67
>> DHCPACK from 192.168.8.2
>> Kernel page fault with the following non-sleepable locks held:
>> exclusive sleep mutex so_rcv (so_rcv) r = 0 (0xc713f078) locked _at_
>> /usr/src/sys/kern/kern_event.c:2005
>
> I'm puzzled by this line number...  This line number doesn't do any
> locks, it is in the function knlist_remove_inevent...

The line 2005 is "mtx_lock((struct mtx *)arg);" of knlist_mtx_lock()
https://svnweb.freebsd.org/base/head/sys/kern/kern_event.c?revision=268843&view=markup#l2005

this function is assigned to (struct knlist *)->kn_lock in knlist_init()
https://svnweb.freebsd.org/base/head/sys/kern/kern_event.c?revision=268843&view=markup#l2058

>> KDB stack backtrace:
>>  rapper+0x2d/frame 0xe8f42710
>> kdb_backtrace(c11aaf80,0,c713f078,c119a9e8,7d5,...) at 0xc0b4b160 =
>> kdb_backtrace+0x30/frame 0xe8f42778
>> witness_warn(5,0,c136b0a0,76fb000,c1833d58,...) at 0xc8b68a52 =
>> witness_warn+0x402/frame 0xe8f427c8
>> trap_pfault(18,3fd,c0dcc2d0,c1f64a80,c75fa000,...) at 0xc102f46b =
>> trap_pfault+0x5b/frame 0xe8f42840
>> trap(e8f42988) at 0xc102edcf = trap+0x6cf/frame 0xe8f4297c
>> calltrap() at 0xc1017c4c = calltrap+0x6/frame 0xe8f4297c
>> filt_soread(c75f7828,0,c119a9e8,48d,0,...) at 0xc0b9837d =
>> filt_soread+0x9d/frame 0xe8f429f0
>> kqueue_register(c6f59310,1,1,4f5,0,...) at 0xc0ad1457 =
>> kqueue_register+0x807/frame 0xe8f42a68
>> kern_kevent(c6f59310,7,12c217ce1 = Xint0x80), eip =

calltrap() seems to be invoked by
    SOCKBUF_LOCK_ASSERT(&so->so_rcv);
of filt_soread() in sys/kern/uipc_socket.c
https://svnweb.freebsd.org/base/head/sys/kern/uipc_socket.c?revision=270664&view=markup#l3250

but I do not know where &so->so_rcv was previously locked.
knlist_init_mtx (which then calls knlist_init) is called with
so->so_rcv in sys/kern/uipc_socket.c in
line 517:  	socreate()
https://svnweb.freebsd.org/base/head/sys/kern/uipc_socket.c?revision=270664&view=markup#l517
and
line 606: sonewconn()
https://svnweb.freebsd.org/base/head/sys/kern/uipc_socket.c?revision=270664&view=markup#l606

so the problem may be around there.
but, I cannot track any further.  the system freezes, so I cannot deal with ddb.

> But notice the knlist_remove_inevent doesn't appear in the back
> trace...
>
> Can you confirm that your kern_event.c is:
> __FBSDID("$FreeBSD: head/sys/kern/kern_event.c 268843 2014-07-18 14:27:04Z bapt
> $");

I checked that it was this revision.

>> instruction poi         = 0x28:0xe8f429f0     fff, type 0x1b
>> DHCPREQUEST on wlan0 to 255.255.255.255 port 67
>> DHCPACK from 192.168.8.2
>>
>> e) kernel configuration differs from GENERIC on the following point
>> options      VIMAGE
>> options      DDB_NUMSYM
>> nocpu        I486_CPU
>> nooptions  VESA
>>
Received on Sun Aug 31 2014 - 09:43:56 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:51 UTC