Hello, so I finally got my crash dump. I'll include some more history further down. First off: http://distfiles.scode.org/mlref/crashdump_20090722/core.txt.0 http://distfiles.scode.org/mlref/crashdump_20090722/backtrace.txt Inline version of backtrace appears below[1] (after background). So this is a general protection fault in vm_page_remove called indirectly from sys_exit(). Worth nothing is that at least once (the previous crash, without a dump) I got a "logic" panic rather than a memory error; I'm pretty sure the panic message was related to page *inserts*. Grepping the source indicates: vm_page.c: panic("vm_page_insert: page already inserted"); vm_page.c: panic("vm_page_insert: offset already allocated"); However I could not say for sure whether one of these was indeed the exact panic I got and I neither have a crash nor was able to see a track trace at the time. Some further background and speculation: This system is root-on-ZFS where I have been tracking CURRENT for several months. I updated every month or so in part to test improvements to ZFS; specifically the fixes that have gone in for deadlock/hang issues. My "test case" is to run bulk building of all my ports (the port list is a semi-typical desktop; about 700 or so packages in total). It would very often hang (before) or crash (now) at least once during such a build; the building of firefox was in particular extremely over-represented, at least now that I see the crash symptome. Going back to my tracking of current, at some point, I think roughly a couple of months ago by now, I stopped experiencing deadlocks/hangs (or at least have not seen it yet), but instead began seeing panic:s. No longer seeing hangs was expected because the reason I updated that particular time, if I recall correctly, was specifically that I believed that all the work-in-progress ZFS fixes had gone in. However I am not 100% sure of the timing. Since then I've updated a couple of times more, most recently to BETA1, but am still seeing this crash. Wannabe speculation based on insufficient understanding of the VM system: vm_page_remove() requires, according to comments, that the object and page must be locked. The actual crash in this case happens when checking m->oflags: if (m->oflags & VPO_BUSY) { m->oflags &= ~VPO_BUSY; vm_page_flash(m); } The "m->oflags & VPO_BUSY" evaluation is the culprit, if line numbers can be trusted. If I recall correctly, at least one of the deadlock/hang fixes for ZFS did involve a change to locking, so I'm thinking the introduction of the crashing may in fact be related to the ZFS fix itself. However now that I think about it perhaps the only locking changes were vnode ones rather than vm objects/pages? Also interestingly reading m->object right before suceeds, and the lock assert on the object does too. Is it possible the vm page was NOT locked even though m->object was locked? [1] Inline backtrace: #0 doadump () at pcpu.h:223 #1 0xffffffff801d248c in db_fncall (dummy1=Variable "dummy1" is not available. ) at /usr/src/sys/ddb/db_command.c:548 #2 0xffffffff801d27c1 in db_command (last_cmdp=0xffffffff80b667a0, cmd_table=Variable "cmd_table" is not available. ) at /usr/src/sys/ddb/db_command.c:445 #3 0xffffffff801d2a10 in db_command_loop () at /usr/src/sys/ddb/db_command.c:498 #4 0xffffffff801d49a9 in db_trap (type=Variable "type" is not available. ) at /usr/src/sys/ddb/db_main.c:229 #5 0xffffffff805b5f25 in kdb_trap (type=9, code=0, tf=0xffffff805b9608d0) at /usr/src/sys/kern/subr_kdb.c:534 #6 0xffffffff80812efd in trap_fatal (frame=0xffffff805b9608d0, eva=Variable "eva" is not available. ) at /usr/src/sys/amd64/amd64/trap.c:847 #7 0xffffffff80813a1d in trap (frame=0xffffff805b9608d0) at /usr/src/sys/amd64/amd64/trap.c:639 #8 0xffffffff807f9793 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:223 #9 0xffffffff807d941f in vm_page_remove (m=0xffffff00bebe7f90) at /usr/src/sys/vm/vm_page.c:730 #10 0xffffffff807d957d in vm_page_free_toq (m=0xffffff00bebe7f90) at /usr/src/sys/vm/vm_page.c:1394 #11 0xffffffff807d7c6b in vm_object_terminate (object=0xffffff0066392948) at /usr/src/sys/vm/vm_object.c:694 #12 0xffffffff807d821c in vm_object_deallocate (object=0xffffff0066392948) at /usr/src/sys/vm/vm_object.c:592 #13 0xffffffff807cfad0 in _vm_map_unlock (map=0xffffff0004811310, file=Variable "file" is not available. ) at /usr/src/sys/vm/vm_map.c:480 #14 0xffffffff807cff8f in vm_map_remove (map=0xffffff0004811310, start=Variable "start" is not available. ) at /usr/src/sys/vm/vm_map.c:2765 #15 0xffffffff807d2e44 in vmspace_exit (td=0xffffff004eb78ab0) at /usr/src/sys/vm/vm_map.c:329 #16 0xffffffff8055a33e in exit1 (td=0xffffff004eb78ab0, rv=0) at /usr/src/sys/kern/kern_exit.c:299 #17 0xffffffff8055b43e in sys_exit (td=Variable "td" is not available. ) at /usr/src/sys/kern/kern_exit.c:110 #18 0xffffffff80813546 in syscall (frame=0xffffff805b960c90) at /usr/src/sys/amd64/amd64/trap.c:984 #19 0xffffffff807f9a20 in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:364 #20 0x000000000047f63c in ?? () Previous frame inner to this frame (corrupt stack?) -- / Peter Schuller PGP userID: 0xE9758B7D or 'Peter Schuller <peter.schuller_at_infidyne.com>' Key retrieval: Send an E-Mail to getpgpkey_at_scode.org E-Mail: peter.schuller_at_infidyne.com Web: http://www.scode.org
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:52 UTC