Re: panic in range_tree_seg64_compare()

From: Matthew Macy <mmacy_at_freebsd.org>
Date: Fri, 28 Aug 2020 13:37:40 -0700
Try updating. I think this may have been fixed in
https://github.com/openzfs/zfs/pull/10823 which was MFVed this
morning.

On Fri, Aug 28, 2020 at 9:49 AM Matthew Macy <mmacy_at_freebsd.org> wrote:
>
> On Thu, Aug 27, 2020 at 10:37 PM Yuri Pankov <ypankov_at_xsmail.com> wrote:
> >
> > Matthew Macy wrote:
> > > On Thu, Aug 27, 2020 at 6:34 PM Yuri Pankov <ypankov_at_xsmail.com> wrote:
> > >>
> > >> Yet another issue I'm seeing after last update (currently running
> > >> r364870), hit it 2 times today:
> > >>
> > >> Fatal trap 12: page fault while in kernel mode
> > >> cpuid = 19; apic id = 0d
> > >> fault virtual address   = 0xfffff819e2ecdc40
> > >> fault code              = supervisor read data, page not present
> > >> instruction pointer     = 0x20:0xffffffff8277fa64
> > >> stack pointer           = 0x28:0xfffffe01f9ff2d90
> > >> frame pointer           = 0x28:0xfffffe01f9ff2d90
> > >> code segment            = base 0x0, limit 0xfffff, type 0x1b
> > >>                           = DPL 0, pres 1, long 1, def32 0, gran 1
> > >> processor eflags        = interrupt enabled, resume, IOPL = 0
> > >> current process         = 48792 (blk-3:0-0)
> > >> trap number             = 12
> > >> panic: page fault
> > >> cpuid = 19
> > >> time = 1598577675
> > >> KDB: stack backtrace:
> > >> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> > >> 0xfffffe01f9ff2a40
> > >> vpanic() at vpanic+0x182/frame 0xfffffe01f9ff2a90
> > >> panic() at panic+0x43/frame 0xfffffe01f9ff2af0
> > >> trap_fatal() at trap_fatal+0x387/frame 0xfffffe01f9ff2b50
> > >> trap_pfault() at trap_pfault+0x97/frame 0xfffffe01f9ff2bb0
> > >> trap() at trap+0x2ab/frame 0xfffffe01f9ff2cc0
> > >> calltrap() at calltrap+0x8/frame 0xfffffe01f9ff2cc0
> > >> --- trap 0xc, rip = 0xffffffff8277fa64, rsp = 0xfffffe01f9ff2d90, rbp =
> > >> 0xfffffe01f9ff2d90 ---
> > >> range_tree_seg64_compare() at range_tree_seg64_compare+0x4/frame
> > >> 0xfffffe01f9ff2d90
> > >> zfs_btree_find() at zfs_btree_find+0x1bd/frame 0xfffffe01f9ff2df0
> > >> range_tree_find_impl() at range_tree_find_impl+0x6e/frame 0xfffffe01f9ff2e30
> > >> range_tree_find() at range_tree_find+0x1c/frame 0xfffffe01f9ff2e70
> > >> range_tree_contains() at range_tree_contains+0x9/frame 0xfffffe01f9ff2e80
> > >> dnode_block_freed() at dnode_block_freed+0x11d/frame 0xfffffe01f9ff2eb0
> > >> dbuf_read() at dbuf_read+0x70c/frame 0xfffffe01f9ff2fc0
> > >> dmu_buf_hold_array_by_dnode() at dmu_buf_hold_array_by_dnode+0x164/frame
> > >> 0xfffffe01f9ff3030
> > >> dmu_read_impl() at dmu_read_impl+0xce/frame 0xfffffe01f9ff30c0
> > >> dmu_read() at dmu_read+0x45/frame 0xfffffe01f9ff3100
> > >> zvol_geom_bio_strategy() at zvol_geom_bio_strategy+0x2aa/frame
> > >> 0xfffffe01f9ff3180
> > >> g_io_request() at g_io_request+0x2df/frame 0xfffffe01f9ff31b0
> > >> g_dev_strategy() at g_dev_strategy+0x155/frame 0xfffffe01f9ff31e0
> > >> physio() at physio+0x4f8/frame 0xfffffe01f9ff3270
> > >> devfs_read_f() at devfs_read_f+0xde/frame 0xfffffe01f9ff32d0
> > >> dofileread() at dofileread+0x81/frame 0xfffffe01f9ff3320
> > >> kern_preadv() at kern_preadv+0x62/frame 0xfffffe01f9ff3360
> > >> sys_preadv() at sys_preadv+0x39/frame 0xfffffe01f9ff3390
> > >> amd64_syscall() at amd64_syscall+0x140/frame 0xfffffe01f9ff34b0
> > >> fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe01f9ff34b0
> > >> --- syscall (289, FreeBSD ELF64, sys_preadv), rip = 0x8006fd89a, rsp =
> > >> 0x7fffdfdfcf18, rbp = 0x7fffdfdfcfc0 ---
> > >> Uptime: 4h13m43s
> > >
> > >
> > >>
> > >> Guessing on zvol_geom_bio_strategy(), it's volmode=dev zvol I'm using
> > >> for bhyve VM.  Anything known?
> > >
> > > Not really. A reproduction scenario would be very helpful. This was
> > > seen once by someone at iX - I committed some additional asserts to
> > > the truenas tree, but haven't heard further.
> > >
> > > +++ b/module/zfs/dbuf.c
> > > _at__at_ -3192,7 +3192,7 _at__at_
> > > dbuf_dirty_leaf_with_existing_frontend(dbuf_dirty_state_t *dds)
> > >                           * scheduled its write with its buffer, we must
> > >                           * disassociate by replacing the frontend.
> > >                           */
> > > -                       ASSERT(db->db_state & (DB_READ|DB_PARTIAL));
> > > +                       ASSERT3U(db->db_state, &, (DB_READ|DB_PARTIAL));
> > >                          ASSERT3U(db->db_dirtycnt, ==, 1);
> > >                          dbuf_dirty_set_data(dds);
> > >                  } else {
> > > _at__at_ -3238,18 +3238,24 _at__at_ dbuf_dirty_record_create_leaf(dbuf_dirty_state_t *dds)
> > >
> > >          dr = dbuf_dirty_record_create(dds);
> > >
> > > +       /*
> > > +        * XXX - convert to ASSERT after dn_free_ranges fix
> > > +        */
> > > +       VERIFY(db->db_level == 0);
> > > +       VERIFY(db->db_blkid != DMU_BONUS_BLKID);
> >
> > Can't find context for both chunks, there are simply no such functions
> > in sys/contrib/openzfs/module/zfs/dbuf.c, and yes, note that I'm running
> > the in-tree version.
>
> Sorry. I forgot that this was against the cow fault avoidance changes.
Received on Fri Aug 28 2020 - 18:37:54 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:25 UTC