Re: panic: vm_page_free_toq: freeing mapped page

From: Ulrich Spörlein <uqs_at_spoerlein.net> Date: Thu, 13 Aug 2009 15:29:07 +0200 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:53 UTC

On Wed, 12.08.2009 at 10:29:03 -0500, Alan Cox wrote:
> Ulrich Spörlein wrote:
> > On Mon, 13.07.2009 at 13:29:56 -0500, Alan Cox wrote:
> >   
> >> Ulrich Spörlein wrote:
> >>     
> >>> On Mon, 13.07.2009 at 19:15:03 +0200, Ulrich Spörlein wrote:
> >>>   
> >>>       
> >>>> On Sun, 12.07.2009 at 14:22:23 -0700, Kip Macy wrote:
> >>>>     
> >>>>         
> >>>>> On Sun, Jul 12, 2009 at 1:31 PM, Ulrich Spörlein<uqs_at_spoerlein.net> wrote:
> >>>>>       
> >>>>>           
> >>>>>> Hi,
> >>>>>>
> >>>>>> 8.0 BETA1 _at_ r195622 will panic reliably when running the clang static
> >>>>>> analyzer on a buildworld with something like the following panic:
> >>>>>>
> >>>>>> panic: vm_page_free_toq: freeing mapped page 0xffffff00c9715b30
> >>>>>> cpuid = 1
> >>>>>> KDB: stack backtrace:
> >>>>>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
> >>>>>> panic() at panic+0x182
> >>>>>> vm_page_free_toq() at vm_page_free_toq+0x1f6
> >>>>>> vm_object_terminate() at vm_object_terminate+0xb7
> >>>>>> vm_object_deallocate() at vm_object_deallocate+0x17a
> >>>>>> _vm_map_unlock() at _vm_map_unlock+0x70
> >>>>>> vm_map_remove() at vm_map_remove+0x6f
> >>>>>> vmspace_free() at vmspace_free+0x56
> >>>>>> vmspace_exec() at vmspace_exec+0x56
> >>>>>> exec_new_vmspace() at exec_new_vmspace+0x133
> >>>>>> exec_elf32_imgact() at exec_elf32_imgact+0x2ee
> >>>>>> kern_execve() at kern_execve+0x3b2
> >>>>>> execve() at execve+0x3d
> >>>>>> syscall() at syscall+0x1af
> >>>>>> Xfast_syscall() at Xfast_syscall+0xe1
> >>>>>> --- syscall (59, FreeBSD ELF64, execve), rip = 0x800c20d0c, rsp = 0x7fffffffd6f8, rbp = 0x7fffffffdbf0 ---
> >>>>>>         
> >>>>>>             
> >>>>> Can you try the following change:
> >>>>>
> >>>>> http://svn.freebsd.org/viewvc/base/user/kmacy/releng_7_2_fcs/sys/vm/vm_object.c?r1=192842&r2=195297
> >>>>>       
> >>>>>           
> >>>> Applied this to HEAD by hand an ran with it, it died 20-30 minutes into
> >>>> the scan-build run. So no luck there. Next up is a test using the
> >>>> GENERIC kernel.
> >>>>         
> >>> No improvement with a GENERIC kernel. Next up will be to run this with
> >>> clean sysctl, loader.conf, etc. Then I'll try disabling SMP.
> >>>
> >>> Does the backtrace above point to any specific subsystem? I'm using UFS,
> >>> ZFS and GELI on this machine and could try a few combinations...
> >>>       
> >> The interesting thing about the backtrace is that it shows a 32-bit i386 
> >> executable being started on a 64-bit amd64 machine.  I've seen this 
> >> backtrace once before, and you'll find it in the PR database.  In that 
> >> case, the problem "went away" after the known-to-be-broken 
> >> ZERO_COPY_SOCKETS option was removed from the reporter's kernel 
> >> configuration.  However, I don't see that as the culprit here.
> >>     
> >
> > Hi Alan, first the bad news
> >
> > I ran this test with a GENERIC kernel, SMP disabled, hw.physmem set to 2
> > GB in single user mode, so no other processes or deamons running,
> > nothing special in loader.conf except for ZFS and GELI. It reliably
> > panics, so nothing new here.
> >
> > Now the good news, you may be able to crash your own amd64 box in 3
> > minutes by doing:
> >
> > mkdir /tmp/foo && cd /tmp/foo
> > fetch -o- https://www.spoerlein.net/pub/llvm-clang.tar.gz | tar xf -
> > while :; do for d in bin sbin usr.bin usr.sbin; do $PWD/scan-build -o /dev/null -k make -C /usr/src/$d clean obj depend all; done; done
> >
> > Please note that scan-build/ccc-analyzer wont actually do anything, as
> > they cannot create output in /dev/null. So this is just running the
> > perl-script and forking make/sh/awk/ccc-analyzer like mad. It does not
> > survive 3 minutes on my Core2 Duo 3.3 GHz.
> >   
> 
> Hi Ulrich,
> 
> I finally got a chance to try this workload.  I'm afraid that I can't 
> reproduce the assertion failure on my amd64 test machine.  I left the 
> test running overnight, and it was still going strong this morning.
> 
> I am using neither ZFS nor GELI.  Is it possible for you to repeat this 
> test without ZFS and/or GELI?
> 
> I would also be curious if anyone else reading this message can 
> reproduce the assertion failure with the above test.

Now isn't this great :/

I haven't tracked the bug for the last couple of weeks, but the system
was updated to recent HEAD and got its ports rebuild (several times).

I don't know which change "fixed" it, but I think it was the perl
rebuild (I had some trouble with perl5.10 on 8.0 at first). Besides, the
process doing the fork in the backtrace was always the perl binary,
IIRC.

So right now I'm no longer able to reproduce it myself ...

Regards,
Uli