Hi, I looked at the crash dump and the code more closely: #18 0xffffffff82be1c5f in i915_gem_fault (dummy=<optimized out>, vmf=<optimized out>) at /usr/ports/graphics/drm-devel-kmod/work/drm-kmod-drm_v5.3_4/drivers/gpu/drm/i915/gem/i915_gem_mman.c:367 (kgdb) p area->vm_obj->lock $43 = {lock_object = {lo_name = 0xffffffff8112c767 "vm object", lo_flags = 627245056, lo_data = 0, lo_witness = 0xfffff8045f575800}, rw_lock = 18446741878623409920} So vm_obj is not NULL and has a rw_lock member Now at intel_freebsd.c:193 (frame #17) the driver calls vm_page_busy_acquire(m, VM_ALLOC_WAITFAIL). 'm' is the page grabbed from vm_obj of the calling frame. The panic occurs in kern_rwlock.c:270 in frame #15 when calling rw_wowner(rwlock2rw(c)) so something goes wrong either in rw_wowner or in rwlock2rw. Looking at rwlock2rw() : /* * Return the rwlock address when the lock cookie address is provided. * This functionality assumes that struct rwlock* have a member named rw_lock. */ #define rwlock2rw(c) (__containerof(c, struct rwlock, rw_lock)) I think this one is just extracting out the rw_lock member of the passed in struct. However I don't understand what the cookie address is about due to my lack of knowledge on kernel locking concepts. So maybe something is wrong with the cookie or the rw_lock value itself. Looking at rw_wowner() : /* * Return a pointer to the owning thread if the lock is write-locked or * NULL if the lock is unlocked or read-locked. */ #define lv_rw_wowner(v) \ ((v) & RW_LOCK_READ ? NULL : \ (struct thread *)RW_OWNER((v))) #define rw_wowner(rw) lv_rw_wowner(RW_READ_VALUE(rw)) I don't think that one could cause a panic but again I'm not experienced enough to be sure, it seems this either returns the thread that owns the lock or NULL if no thread owns it. The is also the fact that the driver calls vm_page_busy_acquire with the VM_ALLOC_WAITFAIL flag which is defined in vm_page.h as : #define VM_ALLOC_WAITFAIL 0x0010 /* (acf) Sleep and return error */ Could this be the reason of the panic as in we try to lock, then cannot and eventually just return an error without retrying ? There is the flag VM_ALLOC_WAITOK that says /* (acf) Sleep and retry */. Should I try to patch intel_freebsd.c to use this flag instead ? Thanks. Le sam. 15 août 2020 à 20:35, Alexandre Levy <a13xlevy_at_gmail.com> a écrit : > Hi, > > I could finally generate a crash dump even with a black screen, I had to > guess I was in the crash handler and I type "dump" and enter which worked. > The driver logs "[drm] Cannot find any crtc or sizes" which I guess is the > reason why I couldn't see anything on my screen. > > Back to the initial problem, I could start a kgdb session, loaded the > i915kms.ko symbols and here are the results : > > (kgdb) bt > #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 > #1 doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:394 > #2 0xffffffff8049c26a in db_dump (dummy=<optimized out>, > dummy2=<unavailable>, dummy3=<unavailable>, dummy4=<unavailable>) at > /usr/src/sys/ddb/db_command.c:575 > #3 0xffffffff8049c02c in db_command (last_cmdp=<optimized out>, > cmd_table=<optimized out>, dopager=1) at /usr/src/sys/ddb/db_command.c:482 > #4 0xffffffff8049bd9d in db_command_loop () at > /usr/src/sys/ddb/db_command.c:535 > #5 0xffffffff8049f048 in db_trap (type=<optimized out>, code=<optimized > out>) at /usr/src/sys/ddb/db_main.c:270 > #6 0xffffffff80c1b374 in kdb_trap (type=3, code=0, tf=<optimized out>) at > /usr/src/sys/kern/subr_kdb.c:699 > #7 0xffffffff8100ca98 in trap (frame=0xfffffe00d7567300) at > /usr/src/sys/amd64/amd64/trap.c:576 > #8 <signal handler called> > #9 kdb_enter (why=0xffffffff811d5de0 "panic", msg=<optimized out>) at > /usr/src/sys/kern/subr_kdb.c:486 > #10 0xffffffff80bd00be in vpanic (fmt=<optimized out>, ap=<optimized out>) > at /usr/src/sys/kern/kern_shutdown.c:902 > #11 0xffffffff80bcfe53 in panic (fmt=0xffffffff81c8c7c8 <cnputs_mtx> > "\b\214\031\201\377\377\377\377") at /usr/src/sys/kern/kern_shutdown.c:839 > #12 0xffffffff8100cee7 in trap_fatal (frame=0xfffffe00d7567600, eva=0) at > /usr/src/sys/amd64/amd64/trap.c:915 > #13 0xffffffff8100c360 in trap (frame=0xfffffe00d7567600) at > /usr/src/sys/amd64/amd64/trap.c:212 > #14 <signal handler called> > #15 _rw_wowned (c=0x2659c92217d5aa52) at > /usr/src/sys/kern/kern_rwlock.c:270 > #16 0xffffffff80ec23ed in vm_page_busy_acquire (m=0xfffffe00040ff9e8, > allocflags=16) at /usr/src/sys/vm/vm_page.c:884 > #17 0xffffffff82b4e980 in remap_io_mapping (vma=0xfffff80315148300, > addr=<optimized out>, pfn=<optimized out>, size=<optimized out>, > iomap=<optimized out>) > at > /usr/ports/graphics/drm-devel-kmod/work/drm-kmod-drm_v5.3_4/drivers/gpu/drm/i915/intel_freebsd.c:193 > #18 0xffffffff82be1c5f in i915_gem_fault (dummy=<optimized out>, > vmf=<optimized out>) > at > /usr/ports/graphics/drm-devel-kmod/work/drm-kmod-drm_v5.3_4/drivers/gpu/drm/i915/gem/i915_gem_mman.c:367 > #19 0xffffffff82cb5ddf in linux_cdev_pager_populate > (vm_obj=0xfffff80368501420, pidx=<optimized out>, fault_type=<optimized > out>, max_prot=<optimized out>, > first=0xfffffe00d7567868, last=0xfffffe00d7567888) at > /usr/src/sys/compat/linuxkpi/common/src/linux_compat.c:554 > #20 0xffffffff80ea9e8f in vm_pager_populate (object=0x2659c92217d5aa52, > pidx=18446741874754451944, fault_type=0, max_prot=0 '\000', > first=<optimized out>, last=<optimized out>) > at /usr/src/sys/vm/vm_pager.h:172 > #21 vm_fault_populate (fs=<optimized out>) at > /usr/src/sys/vm/vm_fault.c:444 > #22 vm_fault_allocate (fs=<optimized out>) at > /usr/src/sys/vm/vm_fault.c:1028 > #23 vm_fault (map=<optimized out>, vaddr=<optimized out>, > fault_type=<optimized out>, fault_flags=<optimized out>, m_hold=<optimized > out>) at /usr/src/sys/vm/vm_fault.c:1338 > #24 0xffffffff80ea98ee in vm_fault_trap (map=0xfffffe00c0f539e8, > vaddr=<optimized out>, fault_type=<optimized out>, fault_flags=0, > signo=0xfffffe00d7567ac4, > ucode=0xfffffe00d7567ac0) at /usr/src/sys/vm/vm_fault.c:585 > #25 0xffffffff8100d0de in trap_pfault (frame=0xfffffe00d7567b00, > usermode=<optimized out>, signo=<optimized out>, ucode=0xffffffff81d1de80 > <w_locklistdata+160624>) > at /usr/src/sys/amd64/amd64/trap.c:817 > #26 0xffffffff8100c72c in trap (frame=0xfffffe00d7567b00) at > /usr/src/sys/amd64/amd64/trap.c:340 > #27 <signal handler called> > #28 0x000000080296659a in ?? () > Backtrace stopped: Cannot access memory at address 0x7fffffffbf38 > (kgdb) list *0xffffffff82be1c5f > 0xffffffff82be1c5f is in i915_gem_fault > (/usr/ports/graphics/drm-devel-kmod/work/drm-kmod-drm_v5.3_4/drivers/gpu/drm/i915/gem/i915_gem_mman.c:367). > 362 ret = i915_vma_pin_fence(vma); > 363 if (ret) > 364 goto err_unpin; > 365 > 366 /* Finally, remap it using the new GTT offset */ > 367 ret = remap_io_mapping(area, > 368 area->vm_start + > (vma->ggtt_view.partial.offset << PAGE_SHIFT), > 369 (ggtt->gmadr.start + > vma->node.start) >> PAGE_SHIFT, > 370 min_t(u64, vma->size, area->vm_end > - area->vm_start), > 371 &ggtt->iomap); > (kgdb) list *0xffffffff82b4e980 > 0xffffffff82b4e980 is in remap_io_mapping > (/usr/ports/graphics/drm-devel-kmod/work/drm-kmod-drm_v5.3_4/drivers/gpu/drm/i915/intel_freebsd.c:193). > 188 pidx++, pa += PAGE_SIZE) { > 189 retry: > 190 m = vm_page_grab(vm_obj, pidx, VM_ALLOC_NOCREAT); > 191 if (m == NULL) { > 192 m = PHYS_TO_VM_PAGE(pa); > 193 if (!vm_page_busy_acquire(m, > VM_ALLOC_WAITFAIL)) > 194 goto retry; > 195 if (vm_page_insert(m, vm_obj, pidx)) { > 196 vm_page_xunbusy(m); > 197 VM_OBJECT_WUNLOCK(vm_obj); > (kgdb) list *0xffffffff80ec23ed > 0xffffffff80ec23ed is in vm_page_busy_acquire > (/usr/src/sys/vm/vm_page.c:884). > 879 if (vm_page_tryacquire(m, allocflags)) > 880 return (true); > 881 if ((allocflags & VM_ALLOC_NOWAIT) != 0) > 882 return (false); > 883 if (obj != NULL) > 884 locked = VM_OBJECT_WOWNED(obj); > 885 else > 886 locked = false; > 887 MPASS(locked || vm_page_wired(m)); > 888 if (_vm_page_busy_sleep(obj, m, m->pindex, > "vmpba", allocflags, > > It seems like the problem occured when calling vm_page_busy_acquire(m, > VM_ALLOC_WAITFAIL) where m might be a NULL pointer ? I am very new to > kernel debugging so not sure where to go from there. > > Thanks. > > Le lun. 10 août 2020 à 12:04, Hans Petter Selasky <hps_at_selasky.org> a > écrit : > >> Hi, >> >> On 2020-08-10 12:59, Alexandre Levy wrote: >> > Looking at the code, the error happens during the call to >> VM_OBJECT_WLOCK >> > (memory page locking for write ?) in the intel_freebsd.c (see [1] >> below). >> > I'm out for a few days but I'll try to dig more into it when I'm back >> next >> > weekend although I have no experience in the drm-devel-kmod codebase. In >> > the meantime if you have any suggestions on debugging this further I'm >> > happy to follow them. >> >> The problem is likely that the vm_obj is NULL. >> >> I think I recall that this function is special and can only be called >> from a certain context, unlike in Linux. Will need the full backtrace >> with line numbers in order to debug this. >> >> --HPS >> >Received on Sun Aug 16 2020 - 13:28:46 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:24 UTC