Re: Instant panic while trying run ports-mgmt/poudriere

From: Don Lewis <truckman_at_FreeBSD.org> Date: Thu, 27 Aug 2015 00:15:51 -0700 (PDT) · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:59 UTC

On 27 Aug, Don Lewis wrote:
> On 27 Aug, Lawrence Stewart wrote:
>> On 08/27/15 09:36, John-Mark Gurney wrote:
>>> Andriy Gapon wrote this message on Sun, Aug 23, 2015 at 09:54 +0300:
>>>> On 12/08/2015 17:11, Lawrence Stewart wrote:
>>>>> On 08/07/15 07:33, Pawel Pekala wrote:
>>>>>> Hi K.,
>>>>>>
>>>>>> On 2015-08-06 12:33 -0700, "K. Macy" <kmacy_at_freebsd.org> wrote:
>>>>>>> Is this still happening?
>>>>>>
>>>>>> Still crashes:
>>>>>
>>>>> +1 for me running r286617
>>>>
>>>> Here is another +1 with r286922.
>>>> I can add a couple of bits of debugging data:
>>>>
>>>> (kgdb) fr 8
>>>> #8  0xffffffff80639d60 in knote (list=0xfffff8019a733ea0,
>>>> hint=2147483648, lockflags=<value optimized out>) at
>>>> /usr/src/sys/kern/kern_event.c:1964
>>>> 1964                    } else if ((lockflags & KNF_NOKQLOCK) != 0) {
>>>> (kgdb) p *list
>>>> $2 = {kl_list = {slh_first = 0x0}, kl_lock = 0xffffffff8063a1e0
>>> 
>>> We should/cannot get here w/ an empty list.  If we do, then there is
>>> something seriously wrong...  The current kn (which we must have as we
>>> are here) MUST be on the list, but as you just showed, there are no
>>> knotes on the list.
>>> 
>>> Can you get me a print of the knote?  That way I can see what flags
>>> are on it?
>> 
>> I quickly tried to get this info for you by building my kernel with -O0
>> and reproducing, but I get an insta-panic on boot with the new kernel:
>> 
>> Fatal double fault
>> rip = 0xffffffff8218c794
>> rsp = 0xfffffe044cdc9fe0
>> rbp = 0xfffffe044cdca110
>> cpuid = 2; apic id = 02
>> panic: double fault
>> cpuid = 2
>> KDB: stack backtrace:
>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
>> 0xfffffe03dcfffe30
>> vpanic() at vpanic+0x189/frame 0xfffffe03dcfffeb0
>> panic() at panic+0x43/frame 0xfffffe03dcffff10
>> dblfault_handler() at dblfault_handler+0xa2/frame 0xfffffe03dcffff30
>> Xdblfault() at Xdblfault+0xac/frame 0xfffffe03dcffff30
>> --- trap 0x17, rip = 0xffffffff8218c794, rsp = 0xfffffe044cdc9fe0, rbp =
>> 0xfffffe044cdca110 ---
>> vdev_queue_aggregate() at vdev_queue_aggregate+0x34/frame 0xfffffe044cdca110
>> vdev_queue_io_to_issue() at vdev_queue_io_to_issue+0x1f5/frame
>> 0xfffffe044cdca560
>> vdev_queue_io() at vdev_queue_io+0x19a/frame 0xfffffe044cdca5b0
>> zio_vdev_io_start() at zio_vdev_io_start+0x81f/frame 0xfffffe044cdca6e0
>> zio_execute() at zio_execute+0x23b/frame 0xfffffe044cdca730
>> zio_nowait() at zio_nowait+0xbe/frame 0xfffffe044cdca760
>> vdev_mirror_io_start() at vdev_mirror_io_start+0x140/frame
>> 0xfffffe044cdca800
>> zio_vdev_io_start() at zio_vdev_io_start+0x12f/frame 0xfffffe044cdca930
>> zio_execute() at zio_execute+0x23b/frame 0xfffffe044cdca980
>> zio_nowait() at zio_nowait+0xbe/frame 0xfffffe044cdca9b0
>> spa_load_verify_cb() at spa_load_verify_cb+0x37d/frame 0xfffffe044cdcaa50
>> traverse_visitbp() at traverse_visitbp+0x5a5/frame 0xfffffe044cdcac60
>> traverse_dnode() at traverse_dnode+0x98/frame 0xfffffe044cdcacd0
>> traverse_visitbp() at traverse_visitbp+0xc66/frame 0xfffffe044cdcaee0
>> traverse_visitbp() at traverse_visitbp+0x930/frame 0xfffffe044cdcb0f0
>> traverse_visitbp() at traverse_visitbp+0x930/frame 0xfffffe044cdcb300
>> traverse_visitbp() at traverse_visitbp+0x930/frame 0xfffffe044cdcb510
>> traverse_visitbp() at traverse_visitbp+0x930/frame 0xfffffe044cdcb720
>> traverse_visitbp() at traverse_visitbp+0x930/frame 0xfffffe044cdcb930
>> traverse_visitbp() at traverse_visitbp+0x930/frame 0xfffffe044cdcbb40
>> traverse_dnode() at traverse_dnode+0x98/frame 0xfffffe044cdcbbb0
>> traverse_visitbp() at traverse_visitbp+0xe59/frame 0xfffffe044cdcbdc0
>> traverse_impl() at traverse_impl+0x79d/frame 0xfffffe044cdcbfd0
>> traverse_dataset() at traverse_dataset+0x93/frame 0xfffffe044cdcc040
>> traverse_pool() at traverse_pool+0x1f2/frame 0xfffffe044cdcc140
>> spa_load_verify() at spa_load_verify+0xf3/frame 0xfffffe044cdcc1f0
>> spa_load_impl() at spa_load_impl+0x2069/frame 0xfffffe044cdcc610
>> spa_load() at spa_load+0x320/frame 0xfffffe044cdcc6d0
>> spa_load_impl() at spa_load_impl+0x150b/frame 0xfffffe044cdccaf0
>> spa_load() at spa_load+0x320/frame 0xfffffe044cdccbb0
>> spa_load_best() at spa_load_best+0xc6/frame 0xfffffe044cdccc50
>> spa_open_common() at spa_open_common+0x246/frame 0xfffffe044cdccd40
>> spa_open() at spa_open+0x35/frame 0xfffffe044cdccd70
>> dsl_pool_hold() at dsl_pool_hold+0x2d/frame 0xfffffe044cdccdb0
>> dmu_objset_own() at dmu_objset_own+0x2e/frame 0xfffffe044cdcce30
>> zfsvfs_create() at zfsvfs_create+0x100/frame 0xfffffe044cdcd050
>> zfs_domount() at zfs_domount+0xa7/frame 0xfffffe044cdcd0e0
>> zfs_mount() at zfs_mount+0x6c3/frame 0xfffffe044cdcd390
>> vfs_donmount() at vfs_donmount+0x1330/frame 0xfffffe044cdcd660
>> kernel_mount() at kernel_mount+0x62/frame 0xfffffe044cdcd6c0
>> parse_mount() at parse_mount+0x668/frame 0xfffffe044cdcd810
>> vfs_mountroot() at vfs_mountroot+0x85c/frame 0xfffffe044cdcd9d0
>> start_init() at start_init+0x62/frame 0xfffffe044cdcda70
>> fork_exit() at fork_exit+0x84/frame 0xfffffe044cdcdab0
>> fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe044cdcdab0
>> --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
>> KDB: enter: panic
>> 
>> Didn't get a core because it panics before dumpdev is set.
>> 
>> Is anyone else able to run -O0 kernels or do I have something set to evil?
> 
> As I recall, double faults are commonly caused by overflowing the kernel
> stack.  If I subtract the values of the first and last frame pointers, I
> get 14752, which is getting pretty large, and rsp rbp in the trap point
> to different 4K pages, so a stack overflow certainly looks possible.
> 
> Try bumping up KSTACK_PAGES in your kernel config.

Actually, that's not necessary anymore since it was made into a tunable
in -CURRENT fairly recently.  Just set kern.kstack_pages to something
larger in loader.conf.