Re: Page fault in amd64 pmap_qremove from vm_thread_new()

From: Kostik Belousov <kostikbel_at_gmail.com> Date: Tue, 13 Feb 2007 21:02:23 +0200 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:05 UTC

On Tue, Feb 13, 2007 at 01:53:12PM -0500, Kris Kennaway wrote:
> I get this frequently when running stress2 on an 8-core amd64 system:
> 
> Fatal trap 12: page fault while in kernel mode
> Fatal trap 12: page fault while in kernel mode
> 
> 
> cpuid = 2;
> 
> 
> apic id = 02
> 
> Fatal trap 12: page fault while in kernel mode
> 
> cpuid = 5; fault virtual address        = 0xffff807ffffff040
> Fatal trap 12: page fault while in kernel mode
> Fatal trap 12: page fault while in kernel mode
> 
> cpuid = 4; apic id = 05
> apic id = 04
> fault virtual address   = 0xffff807ffffff0e0
> fault virtual address   = 0xffff807ffffff0b8
> cpuid = 0; fault code           = supervisor write data, page not present
> 
> instruction pointer     = 0x8:0xffffffff803deedd
> cpuid = 3; stack pointer                = 0x10:0xffffffffc7647720
> fault code              = supervisor write data, page not present
> 
> instruction pointer     = 0x8:0xffffffff803deedd
> apic id = 00
> stack pointer           = 0x10:0xffffffffcfd7e720
> fault code              = supervisor write data, page not present
> frame pointer           = 0x10:0xffffffffc7647730
> frame pointer           = 0x10:0xffffffffcfd7e730
> Fatal trap 12: page fault while in kernel mode
> 
> cpuid = 6;
> instruction pointer     = 0x8:0xffffffff803deedd
> 
> stack pointer           = 0x10:0xffffffffb2b93720
> 
> frame pointer           = 0x10:0xffffffffb2b93730
> 
> code segment            = base 0x0, limit 0xfffff, type 0x1b
> 
>                         = DPL 0, pres 1, long 1, def32 0, gran 1
> 
> processor eflags        =
> interrupt enabled,
> resume, Fatal trap 12: page fault while in kernel mode
> apic id = 06
> cpuid = 7; fault virtual address        = 0xffff807ffffff108
> apic id = 07
> fault code              = supervisor write data, page not present
> code segment            = base 0x0, limit 0xfffff, type 0x1b
> apic id = 03
>                         = DPL 0, pres 1, long 1, def32 0, gran 1
> fault virtual address   = 0xffff807ffffff068
> IOPL = 0
> fault code              = supervisor write data, page not present
> fault virtual address   = 0xffff807ffffff018
> instruction pointer     = 0x8:0xffffffff803deedd
> instruction pointer     = 0x8:0xffffffff803deedd
> Fatal trap 12: page fault while in kernel mode
> stack pointer           = 0x10:0xffffffffbf901720
> cpuid = 4; stack pointer                = 0x10:0xffffffffb1c11720
> processor eflags        = frame pointer         = 0x10:0xffffffffb1c11730
> interrupt enabled, resume, fault code           = supervisor write data, page not present
> IOPL = 0
> instruction pointer     = 0x8:0xffffffff803deedd
> current process         = stack pointer         = 0x10:0xffffffffd5b25720
> frame pointer           = 0x10:0xffffffffbf901730
> frame pointer           = 0x10:0xffffffffd5b25730
> code segment            = base 0x0, limit 0xfffff, type 0x1b
> current process         =                       = DPL 0, pres 1, long 1, def32 0, gran 1
> code segment            = base 0x0, limit 0xfffff, type 0x1b
> code segment            = base 0x0, limit 0xfffff, type 0x1b
> 18747 (thr2)
> [thread pid 18747 tid 142909 ]
> Stopped at      pmap_qremove+0x2d:      movq    $0,(%rcx,%rax,8)
> db> wh
> Tracing pid 18747 tid 142909 td 0xffffff0095710cd0
> pmap_qremove() at pmap_qremove+0x2d
> vm_thread_new() at vm_thread_new+0x8d
> thread_init() at thread_init+0x16
> slab_zalloc() at slab_zalloc+0x282
> uma_zone_slab() at uma_zone_slab+0x1ae
> uma_zalloc_bucket() at uma_zalloc_bucket+0x19d
> uma_zalloc_arg() at uma_zalloc_arg+0x3a3
> thread_alloc() at thread_alloc+0x1f
> create_thread() at create_thread+0xc5
> kern_thr_new() at kern_thr_new+0x75
> thr_new() at thr_new+0x62
> syscall() at syscall+0x310
> Xfast_syscall() at Xfast_syscall+0xab
> --- syscall (455, FreeBSD ELF64, thr_new), rip = 0x8007a1cac, rsp = 0x7fffffffdef8, rbp = 0 ---
> db> show allpcpu
> Current CPU: 2
> 
> cpuid        = 0
> curthread    = 0xffffff00717e8290: pid 18944 "thr2"
> curpcb       = 0xffffffffe2e33d50
> fpcurthread  = none
> idlethread   = 0xffffff00b9aa6520: pid 17 "idle: cpu0"
> spin locks held:
> 
> cpuid        = 1
> curthread    = 0xffffff0015e9d7b0: pid 18736 "thr2"
> curpcb       = 0xffffffffbceefd50
> fpcurthread  = none
> idlethread   = 0xffffff00b9aa6290: pid 16 "idle: cpu1"
> spin locks held:
> exclusive spin mutex sio r = 0 (0xffffffff806bf3c0) locked _at_ dev/sio/sio.c:1390
> 
> cpuid        = 2
> curthread    = 0xffffff0095710cd0: pid 18747 "thr2"
> curpcb       = 0xffffffffcfd7ed50
> fpcurthread  = none
> idlethread   = 0xffffff00b9aa6000: pid 15 "idle: cpu2"
> spin locks held:
> 
> cpuid        = 3
> curthread    = 0xffffff00ad485290: pid 18743 "thr2"
> curpcb       = 0xffffffffd5b25d50
> fpcurthread  = none
> idlethread   = 0xffffff00b9a63cd0: pid 14 "idle: cpu3"
> spin locks held:
> 
> cpuid        = 4
> curthread    = 0xffffff0098fc7000: pid 18942 "thr2"
> curpcb       = 0xffffffffc77fad50
> fpcurthread  = none
> idlethread   = 0xffffff00b9a63000: pid 13 "idle: cpu4"
> spin locks held:
> exclusive spin mutex turnstile chain r = 0 (0xffffffff80613ed8) locked _at_ kern/subr_turnstile.c:489
> 
> cpuid        = 5
> curthread    = 0xffffff00215b8cd0: pid 18708 "thr2"
> curpcb       = 0xffffffffb2b93d50
> fpcurthread  = none
> idlethread   = 0xffffff00b9a8fcd0: pid 12 "idle: cpu5"
> spin locks held:
> 
> cpuid        = 6
> curthread    = 0xffffff005b72d520: pid 18718 "thr2"
> curpcb       = 0xffffffffb1c11d50
> fpcurthread  = none
> idlethread   = 0xffffff00b9a8fa40: pid 11 "idle: cpu6"
> spin locks held:
> 
> cpuid        = 7
> curthread    = 0xffffff0078aae7b0: pid 18782 "thr2"
> curpcb       = 0xffffffffbf901d50
> fpcurthread  = none
> idlethread   = 0xffffff00b9a8f7b0: pid 10 "idle: cpu7"
> spin locks held:
> 
> For some reason ddb doesn't give sensible backtraces for the running threads:
> 
> db> wh 18944
> Tracing pid 18944 tid 130433 td 0xffffff009daa7290
> fork_trampoline() at fork_trampoline
> db> wh 18736
> Tracing pid 18736 tid 165977 td 0xffffff00632b2cd0
> fork_trampoline() at fork_trampoline
> db> wh 18747
> Tracing pid 18747 tid 165890 td 0xffffff0037403000
> fork_trampoline() at fork_trampoline
> db> wh 18743
> Tracing pid 18743 tid 165929 td 0xffffff004f59e000
> fork_trampoline() at fork_trampoline
> db> wh 18942
> Tracing pid 18942 tid 130531 td 0xffffff000a166520
> fork_trampoline() at fork_trampoline
> db> wh 18708
> Tracing pid 18708 tid 166269 td 0xffffff005c28a290
> fork_trampoline() at fork_trampoline
> db> wh 18718
> Tracing pid 18718 tid 111088 td 0xffffff0081f51a40
> fork_trampoline() at fork_trampoline
> db> wh 18782
> Tracing pid 18782 tid 166078 td 0xffffff0052b4c000
> fork_trampoline() at fork_trampoline

Is the backtrace for faulted thread always the same ? And this is CURRENT ?

I'm starring at similar (looks random) corruption on amd64 6.2-RELEASE.
Machine already produced >2 core dumps.