Re: i386 kernel page fault in generic_bcopy() during shutdown

From: Don Lewis <truckman_at_FreeBSD.org>
Date: Thu, 8 Feb 2007 01:53:56 -0800 (PST)
On  6 Feb, To: freebsd-current_at_FreeBSD.org wrote:
> My Pentium-M laptop has consistently paniced during shutdown since I
> updated kernel and world in early January.  It still has the problem
> even after I updated the kernel and world a couple days ago.  My Athlon
> XP desktop machine does not exhibit this problem.  The kernel on the
> affected machine is close to GENERIC, with SMP, apic, gif, faith, and
> atapicd removed, and with atapicam added.
> 
> The page faults occur in a couple of different places.  I've seen
> generic_bcopy() and pmap_allocpte().  Occasionally I see a double fault.
> 
> 
> kgdb seems to have trouble unwinding the stack from the last crash:
> 
> # kgdb /boot/kernel/kernel /var/crash/vmcore.6
> kgdb: kvm_nlist(_stopped_cpus): 
> kgdb: kvm_nlist(_stoppcbs): 
> [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "i386-marcel-freebsd".
> 
> Unread portion of the kernel message buffer:
> 
> 
> Fatal trap 12: page fault while in kernel mode
> fault virtual address   = 0xd6247d90
> fault code              = supervisor write, page not present
> instruction pointer     = 0x20:0xc089d9c6
> stack pointer           = 0x28:0xd4ff0bb8
> frame pointer           = 0x28:0xd4ff0be4
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                         = DPL 0, pres 1, def32 1, gran 1
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process         = 1018 (shutdown)
> Physical memory: 502 MB
> Dumping 67 MB: 52 36 20 4
> 
> #0  doadump () at pcpu.h:166
> 166     pcpu.h: No such file or directory.
>         in pcpu.h
> #0  doadump () at pcpu.h:166
> #1  0xc0475a57 in db_fncall (dummy1=-721483344, dummy2=0, dummy3=-1063115424, 
>     dummy4=0xd4ff098c "_at_z\ufffd\ufffd") at /usr/src/sys/ddb/db_command.c:486
> #2  0xc0475863 in db_command (last_cmdp=0xc09fb064, cmd_table=0x0)
>     at /usr/src/sys/ddb/db_command.c:401
> #3  0xc047591e in db_command_loop () at /usr/src/sys/ddb/db_command.c:453
> #4  0xc0477569 in db_trap (type=12, code=0) at /usr/src/sys/ddb/db_main.c:222
> #5  0xc06cabc9 in kdb_trap (type=12, code=0, tf=0x0)
>     at /usr/src/sys/kern/subr_kdb.c:502
> #6  0xc089feed in trap_fatal (frame=0xd4ff0b78, eva=3592715664)
>     at /usr/src/sys/i386/i386/trap.c:859
> #7  0xc089fc4f in trap_pfault (frame=0xd4ff0b78, usermode=0, eva=3592715664)
>     at /usr/src/sys/i386/i386/trap.c:777
> #8  0xc089f872 in trap (frame=0xd4ff0b78) at /usr/src/sys/i386/i386/trap.c:462
> #9  0xc089009b in calltrap () at /usr/src/sys/i386/i386/exception.s:139
> #10 0xd6247d90 in ?? ()
> Previous frame inner to this frame (corrupt stack?)
> 
> According to the instruction pointer in the trap frame, this time the
> fault is occured inside generic_bcopy().
> 
> 
> (kgdb) list *0xc089d9c6
> 0xc089d9c6 is at /usr/src/sys/i386/i386/support.s:490.
> 485             cmpl    %ecx,%eax                       /* overlapping
> && src < dst? */ 486             jb      1f
> 487
> 488             shrl    $2,%ecx                         /* copy by
> 32-bit words */ 489             cld
> /* nope, copy forwards */ 490             rep
> 491             movsl
> 492             movl    20(%esp),%ecx
> 493             andl    $3,%ecx                         /* any bytes
> left? */ 494             rep
> 
> 
> I just rebooted again and got this stack trace in DDB:
> 
> pmap_allocpte() at pmap_allocpte+0x2f
> pmap_copy() at pmap_copy+0x1c5
> vm_map_copy_entry() at vm_map_copy_entry+0x119
> vmspace_fork() at vmspace_fork+0x1f8
> vm_forkproc() at vm_forkproc()+0xb3
> fork1() at fork1+0xdc9
> fork() at fork+0x18
> syscall() at ...
> 
> The problem seems to consistently happen with a fork1() call on the
> stack.
> 
> This is what kgdb reports for the second crash.
> 
> # kgdb /boot/kernel/kernel /var/crash/vmcore.7
> kgdb: kvm_nlist(_stopped_cpus): 
> kgdb: kvm_nlist(_stoppcbs): 
> [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "i386-marcel-freebsd".
> 
> Unread portion of the kernel message buffer:
> Kernel page fault with the following non-sleepable locks held:
> exclusive sleep mutex pmap r = 0 (0xc31131ac) locked _at_ /usr/src/sys/i386/i386/pmap.c:2773
> exclusive sleep mutex pmap r = 0 (0xc29640a8) locked _at_ /usr/src/sys/i386/i386/pmap.c:2772
> exclusive sleep mutex vm page queue mutex r = 0 (0xc0a7e61c) locked _at_ /usr/src/sys/i386/i386/pmap.c:2767
> KDB: stack backtrace:
> db_trace_self_wrapper(c092a31e) at db_trace_self_wrapper+0x25
> kdb_backtrace(3,c295c000,c,d3ad2b1c,d3ad2b10,...) at kdb_backtrace+0x29
> witness_warn(5,0,c094defe) at witness_warn+0x192
> trap(d3ad2b1c) at trap+0xfb
> calltrap() at calltrap+0x6
> --- trap 0xd624f000, eip = 0, esp = 0x10212, ebp = 0xc31131ac ---
> (null)(1430000,c0a34ac8,c2959360,0,d624f000,...) at 0
> __func__.0(61727420,78302070,202c3731,20706965,2325203d,...) at 0xc094ad95
> 
> 
> Fatal trap 12: page fault while in kernel mode
> fault virtual address   = 0xd624f080
> fault code              = supervisor read, page not present
> instruction pointer     = 0x20:0xc089a513
> stack pointer           = 0x28:0xd3ad2b5c
> frame pointer           = 0x28:0xd3ad2b68
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                         = DPL 0, pres 1, def32 1, gran 1
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process         = 1 (init)
> Physical memory: 502 MB
> Dumping 101 MB: 86 70 54 38 22 6
> 
> #0  doadump () at pcpu.h:166
> 166     pcpu.h: No such file or directory.
>         in pcpu.h
> (kgdb) where
> #0  doadump () at pcpu.h:166
> #1  0xc0475a57 in db_fncall (dummy1=-743626368, dummy2=0, dummy3=-1063115424, 
>     dummy4=0xd3ad295c "_at_z\ufffd\ufffd") at /usr/src/sys/ddb/db_command.c:486
> #2  0xc0475863 in db_command (last_cmdp=0xc09fb064, cmd_table=0x0)
>     at /usr/src/sys/ddb/db_command.c:401
> #3  0xc047591e in db_command_loop () at /usr/src/sys/ddb/db_command.c:453
> #4  0xc0477569 in db_trap (type=12, code=0) at /usr/src/sys/ddb/db_main.c:222
> #5  0xc06cabc9 in kdb_trap (type=12, code=0, tf=0x0)
>     at /usr/src/sys/kern/subr_kdb.c:502
> #6  0xc089feed in trap_fatal (frame=0xd3ad2b1c, eva=3592745088)
>     at /usr/src/sys/i386/i386/trap.c:859
> #7  0xc089f59b in trap (frame=0xd3ad2b1c) at /usr/src/sys/i386/i386/trap.c:276
> #8  0xc089009b in calltrap () at /usr/src/sys/i386/i386/exception.s:139
> #9  0xd624f080 in ?? ()
> Previous frame inner to this frame (corrupt stack?)
> (kgdb) list *0xc089a513
> 0xc089a513 is in pmap_allocpte (/usr/src/sys/i386/i386/pmap.c:1401).
> 1396            ptepindex = va >> PDRSHIFT;
> 1397    retry:
> 1398            /*
> 1399             * Get the page directory entry
> 1400             */
> 1401            ptepa = pmap->pm_pdir[ptepindex];
> 1402
> 1403            /*
> 1404             * This supports switching from a 4MB page to a
> 1405             * normal 4K page.


This problem appears to be triggered by killing the Xorg server.  I can
also trigger the panic with Ctrl-Alt-Backspace, or
"/usr/local/etc/rc.d/gdm stop".  On the other hand, I found a workaround
for the shutdown case.  If I Ctrl-Alt-F1 to switch to a text vty and log
on in text mode, the shutdown command cleanly shuts down the system.

I suspect that is problem is likely to be graphics hardware dependent,
so this is what the Xorg server says about the hardware in my laptop:
	(--) PCI:*(1:0:0) ATI Technologies Inc Radeon Mobility M7 LW [Radeon Mobility 75
00] rev 0, Mem _at_ 0xe0000000/27, 0xc0100000/16, I/O _at_ 0x3000/8
Received on Thu Feb 08 2007 - 08:54:05 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:05 UTC