Re: panic: vm_page_free_toq: freeing mapped page

From: Peter Holm <pho_at_freebsd.org> Date: Thu, 13 Aug 2009 16:27:23 +0200 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:53 UTC

On Thu, Aug 13, 2009 at 03:29:07PM +0200, Ulrich Spörlein wrote:
> On Wed, 12.08.2009 at 10:29:03 -0500, Alan Cox wrote:
> > Ulrich Spörlein wrote:
> > > On Mon, 13.07.2009 at 13:29:56 -0500, Alan Cox wrote:
> > >   
> > >> Ulrich Spörlein wrote:
> > >>     
> > >>> On Mon, 13.07.2009 at 19:15:03 +0200, Ulrich Spörlein wrote:
> > >>>   
> > >>>       
> > >>>> On Sun, 12.07.2009 at 14:22:23 -0700, Kip Macy wrote:
> > >>>>     
> > >>>>         
> > >>>>> On Sun, Jul 12, 2009 at 1:31 PM, Ulrich Spörlein<uqs_at_spoerlein.net> wrote:
> > >>>>>       
> > >>>>>           
> > >>>>>> Hi,
> > >>>>>>
> > >>>>>> 8.0 BETA1 _at_ r195622 will panic reliably when running the clang static
> > >>>>>> analyzer on a buildworld with something like the following panic:
> > >>>>>>
> > >>>>>> panic: vm_page_free_toq: freeing mapped page 0xffffff00c9715b30
> > >>>>>> cpuid = 1
> > >>>>>> KDB: stack backtrace:
> > >>>>>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
> > >>>>>> panic() at panic+0x182
> > >>>>>> vm_page_free_toq() at vm_page_free_toq+0x1f6
> > >>>>>> vm_object_terminate() at vm_object_terminate+0xb7
> > >>>>>> vm_object_deallocate() at vm_object_deallocate+0x17a
> > >>>>>> _vm_map_unlock() at _vm_map_unlock+0x70
> > >>>>>> vm_map_remove() at vm_map_remove+0x6f
> > >>>>>> vmspace_free() at vmspace_free+0x56
> > >>>>>> vmspace_exec() at vmspace_exec+0x56
> > >>>>>> exec_new_vmspace() at exec_new_vmspace+0x133
> > >>>>>> exec_elf32_imgact() at exec_elf32_imgact+0x2ee
> > >>>>>> kern_execve() at kern_execve+0x3b2
> > >>>>>> execve() at execve+0x3d
> > >>>>>> syscall() at syscall+0x1af
> > >>>>>> Xfast_syscall() at Xfast_syscall+0xe1
> > >>>>>> --- syscall (59, FreeBSD ELF64, execve), rip = 0x800c20d0c, rsp = 0x7fffffffd6f8, rbp = 0x7fffffffdbf0 ---
> > >>>>>>         
> > >>>>>>             
> > >>>>> Can you try the following change:
> > >>>>>
> > >>>>> http://svn.freebsd.org/viewvc/base/user/kmacy/releng_7_2_fcs/sys/vm/vm_object.c?r1=192842&r2=195297
> > >>>>>       
> > >>>>>           
> > >>>> Applied this to HEAD by hand an ran with it, it died 20-30 minutes into
> > >>>> the scan-build run. So no luck there. Next up is a test using the
> > >>>> GENERIC kernel.
> > >>>>         
> > >>> No improvement with a GENERIC kernel. Next up will be to run this with
> > >>> clean sysctl, loader.conf, etc. Then I'll try disabling SMP.
> > >>>
> > >>> Does the backtrace above point to any specific subsystem? I'm using UFS,
> > >>> ZFS and GELI on this machine and could try a few combinations...
> > >>>       
> > >> The interesting thing about the backtrace is that it shows a 32-bit i386 
> > >> executable being started on a 64-bit amd64 machine.  I've seen this 
> > >> backtrace once before, and you'll find it in the PR database.  In that 
> > >> case, the problem "went away" after the known-to-be-broken 
> > >> ZERO_COPY_SOCKETS option was removed from the reporter's kernel 
> > >> configuration.  However, I don't see that as the culprit here.
> > >>     
> > >
> > > Hi Alan, first the bad news
> > >
> > > I ran this test with a GENERIC kernel, SMP disabled, hw.physmem set to 2
> > > GB in single user mode, so no other processes or deamons running,
> > > nothing special in loader.conf except for ZFS and GELI. It reliably
> > > panics, so nothing new here.
> > >
> > > Now the good news, you may be able to crash your own amd64 box in 3
> > > minutes by doing:
> > >
> > > mkdir /tmp/foo && cd /tmp/foo
> > > fetch -o- https://www.spoerlein.net/pub/llvm-clang.tar.gz | tar xf -
> > > while :; do for d in bin sbin usr.bin usr.sbin; do $PWD/scan-build -o /dev/null -k make -C /usr/src/$d clean obj depend all; done; done
> > >
> > > Please note that scan-build/ccc-analyzer wont actually do anything, as
> > > they cannot create output in /dev/null. So this is just running the
> > > perl-script and forking make/sh/awk/ccc-analyzer like mad. It does not
> > > survive 3 minutes on my Core2 Duo 3.3 GHz.
> > >   
> > 
> > Hi Ulrich,
> > 
> > I finally got a chance to try this workload.  I'm afraid that I can't 
> > reproduce the assertion failure on my amd64 test machine.  I left the 
> > test running overnight, and it was still going strong this morning.
> > 
> > I am using neither ZFS nor GELI.  Is it possible for you to repeat this 
> > test without ZFS and/or GELI?
> > 
> > I would also be curious if anyone else reading this message can 
> > reproduce the assertion failure with the above test.
> 
> Now isn't this great :/
> 
> I haven't tracked the bug for the last couple of weeks, but the system
> was updated to recent HEAD and got its ports rebuild (several times).
> 
> I don't know which change "fixed" it, but I think it was the perl
> rebuild (I had some trouble with perl5.10 on 8.0 at first). Besides, the
> process doing the fork in the backtrace was always the perl binary,
> IIRC.
> 
> So right now I'm no longer able to reproduce it myself ...
> 
> Regards,
> Uli

Using your test scenario I got the panic.

- Peter