Re: panic: vm_fault: fault on nofault entry

From: Sean Bruno <sbruno_at_ignoranthack.me>
Date: Mon, 10 Mar 2014 07:10:20 +0900
On Sun, 2014-03-09 at 14:16 -0400, Glen Barber wrote:
> On Sun, Mar 09, 2014 at 08:01:32PM +0200, Konstantin Belousov wrote:
> > On Sun, Mar 09, 2014 at 12:56:48PM -0400, Glen Barber wrote:
> > > We are having regular panics on several machines in the cluster.
> > > 
> > > Below follows the script from the kgdb(1) session, hopefully providing
> > > enough information.  This machine runs 11.0-CURRENT #2 r262892, from
> > > 2 days ago.
> > > 
> > > It uses tmpfs(5) for the port build workspace.  I have an unconfirmed
> > > suspicion that use of sysutils/lsof is involved somehow, but cannot be
> > > sure.  (In my experience with panics with port building, removing lsof
> > > from the system did have an effect, but I may be going down the wrong
> > > rabbit hole.)
> > > 
> > 
> > This is very similar to issue reported several time ago.
> > Try this patch.  I never get a feedback.
> > 
> > diff --git a/sys/amd64/amd64/mem.c b/sys/amd64/amd64/mem.c
> > index abbbb21..fd9c5df 100644
> > --- a/sys/amd64/amd64/mem.c
> > +++ b/sys/amd64/amd64/mem.c
> > _at__at_ -98,7 +98,13 _at__at_ memrw(struct cdev *dev, struct uio *uio, int flags)
> >  kmemphys:
> >  			o = v & PAGE_MASK;
> >  			c = min(uio->uio_resid, (u_int)(PAGE_SIZE - o));
> > -			error = uiomove((void *)PHYS_TO_DMAP(v), (int)c, uio);
> > +			v = PHYS_TO_DMAP(v);
> > +			if (v < DMAP_MIN_ADDRESS ||
> > +			    (v > DMAP_MIN_ADDRESS + dmaplimit &&
> > +			    v <= DMAP_MAX_ADDRESS) ||
> > +			    pmap_kextract(v) == 0)
> > +				return (EFAULT);
> > +			error = uiomove((void *)v, (int)c, uio);
> >  			continue;
> >  		}
> >  		else if (dev2unit(dev) == CDEV_MINOR_KMEM) {
> 
> There is a very similar patch on one of these machines.
> 
>   Index: sys/amd64/amd64/mem.c
>   ===================================================================
>   --- sys/amd64/amd64/mem.c	(revision 262298)
>   +++ sys/amd64/amd64/mem.c	(working copy)
>   _at__at_ -98,6 +98,12 _at__at_
>    kmemphys:
>    			o = v & PAGE_MASK;
>    			c = min(uio->uio_resid, (u_int)(PAGE_SIZE - o));
>   +			v = PHYS_TO_DMAP(v);
>   +			if (v < DMAP_MIN_ADDRESS ||
>   +			    (v > DMAP_MIN_ADDRESS + dmaplimit &&
>   +			    v <= DMAP_MAX_ADDRESS) ||
>   +			    pmap_kextract(v) == 0)
>   +				return (EFAULT);
>    			error = uiomove((void *)PHYS_TO_DMAP(v), (int)c, uio);
>    			continue;
>    		}
>   Index: sys/amd64/amd64/pmap.c
>   ===================================================================
>   --- sys/amd64/amd64/pmap.c	(revision 262298)
>   +++ sys/amd64/amd64/pmap.c	(working copy)
>   _at__at_ -321,7 +321,7 _at__at_
>        "Number of kernel page table pages allocated on bootup");
>    
>    static int ndmpdp;
>   -static vm_paddr_t dmaplimit;
>   +vm_paddr_t dmaplimit;
>    vm_offset_t kernel_vm_end = VM_MIN_KERNEL_ADDRESS;
>    pt_entry_t pg_nx;
>    
>   Index: sys/amd64/include/pmap.h
>   ===================================================================
>   --- sys/amd64/include/pmap.h	(revision 262298)
>   +++ sys/amd64/include/pmap.h	(working copy)
>   _at__at_ -369,6 +369,7 _at__at_
>    extern vm_paddr_t dump_avail[];
>    extern vm_offset_t virtual_avail;
>    extern vm_offset_t virtual_end;
>   +extern vm_paddr_t dmaplimit;
>    
>    #define	pmap_page_get_memattr(m)	((vm_memattr_t)(m)->md.pat_mode)
>    #define	pmap_page_is_write_mapped(m)	(((m)->aflags & PGA_WRITEABLE) != 0)
> 
> The machine this change is on paniced today as well.  That machine runs
> r262298M, and I have a vmcore from Feb 24 (there was not enough
> available space to get a crash dump today.)
> 
> The backtrace from Feb 24 follows.
> 
> Script started on Sun Mar  9 18:14:41 2014
> root_at_redbuild04.nyi:/usr/obj/usr/src/sys/REDBUILD # sh
> # kgdb ./kernel.debug /var/crash/vmcore.3
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "amd64-marcel-freebsd"...
> 
> Unread portion of the kernel message buffer:
> panic: vm_fault: fault on nofault entry, addr: fffffe03becbc000
> cpuid = 23
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe1838ec1180
> kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe1838ec1230
> panic() at panic+0x155/frame 0xfffffe1838ec12b0
> vm_fault_hold() at vm_fault_hold+0x1e7a/frame 0xfffffe1838ec1500
> vm_fault() at vm_fault+0x77/frame 0xfffffe1838ec1540
> trap_pfault() at trap_pfault+0x199/frame 0xfffffe1838ec15e0
> trap() at trap+0x4a0/frame 0xfffffe1838ec17f0
> calltrap() at calltrap+0x8/frame 0xfffffe1838ec17f0
> --- trap 0xc, rip = 0xffffffff80d971fb, rsp = 0xfffffe1838ec18b0, rbp = 0xfffffe1838ec1910 ---
> copyout() at copyout+0x3b/frame 0xfffffe1838ec1910
> memrw() at memrw+0x1ef/frame 0xfffffe1838ec1950
> giant_read() at giant_read+0xa4/frame 0xfffffe1838ec1990
> devfs_read_f() at devfs_read_f+0xeb/frame 0xfffffe1838ec19f0
> dofileread() at dofileread+0x95/frame 0xfffffe1838ec1a40
> kern_readv() at kern_readv+0x68/frame 0xfffffe1838ec1a90
> sys_read() at sys_read+0x63/frame 0xfffffe1838ec1ae0
> amd64_syscall() at amd64_syscall+0x3fb/frame 0xfffffe1838ec1bf0
> Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe1838ec1bf0
> --- syscall (3, FreeBSD ELF64, sys_read), rip = 0x800b8343a, rsp = 0x7fffffffcfe8, rbp = 0x7fffffffd030 ---
> KDB: enter: panic
> 
> Reading symbols from /boot/kernel/zfs.ko.symbols...done.
> Loaded symbols for /boot/kernel/zfs.ko.symbols
> Reading symbols from /boot/kernel/opensolaris.ko.symbols...done.
> Loaded symbols for /boot/kernel/opensolaris.ko.symbols
> Reading symbols from /boot/kernel/ums.ko.symbols...done.
> Loaded symbols for /boot/kernel/ums.ko.symbols
> Reading symbols from /boot/kernel/tmpfs.ko.symbols...done.
> Loaded symbols for /boot/kernel/tmpfs.ko.symbols
> Reading symbols from /boot/kernel/nullfs.ko.symbols...done.
> Loaded symbols for /boot/kernel/nullfs.ko.symbols
> Reading symbols from /boot/kernel/linprocfs.ko.symbols...done.
> Loaded symbols for /boot/kernel/linprocfs.ko.symbols
> Reading symbols from /boot/kernel/linux.ko.symbols...done.
> Loaded symbols for /boot/kernel/linux.ko.symbols
> #0  doadump (textdump=-954994000) at pcpu.h:219
> 219		__asm("movq %%gs:%1,%0" : "=r" (td)
> (kgdb) bt
> #0  doadump (textdump=-954994000) at pcpu.h:219
> #1  0xffffffff8034a175 in db_fncall (dummy1=<value optimized out>, 
>     dummy2=<value optimized out>, dummy3=<value optimized out>, dummy4=<value optimized out>)
>     at /usr/src/sys/ddb/db_command.c:578
> #2  0xffffffff80349e5d in db_command (cmd_table=0x0) at /usr/src/sys/ddb/db_command.c:449
> #3  0xffffffff80349bd4 in db_command_loop () at /usr/src/sys/ddb/db_command.c:502
> #4  0xffffffff8034c630 in db_trap (type=<value optimized out>, code=0)
>     at /usr/src/sys/ddb/db_main.c:231
> #5  0xffffffff80987329 in kdb_trap (type=3, code=0, tf=<value optimized out>)
>     at /usr/src/sys/kern/subr_kdb.c:656
> #6  0xffffffff80d99009 in trap (frame=0xfffffe1838ec1160)
>     at /usr/src/sys/amd64/amd64/trap.c:571
> #7  0xffffffff80d7dd12 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:231
> #8  0xffffffff80986a8e in kdb_enter (why=0xffffffff8100ed4f "panic", msg=<value optimized out>)
>     at cpufunc.h:63
> #9  0xffffffff809462b5 in panic (fmt=<value optimized out>)
>     at /usr/src/sys/kern/kern_shutdown.c:752
> #10 0xffffffff80c0981a in vm_fault_hold (map=<value optimized out>, 
>     vaddr=<value optimized out>, fault_type=<value optimized out>, 
>     fault_flags=<value optimized out>, m_hold=<value optimized out>)
>     at /usr/src/sys/vm/vm_fault.c:272
> #11 0xffffffff80c07957 in vm_fault (map=0xfffff80002000000, vaddr=<value optimized out>, 
>     fault_type=1 '\001', fault_flags=128) at /usr/src/sys/vm/vm_fault.c:217
> #12 0xffffffff80d997f9 in trap_pfault (frame=0xfffffe1838ec1800, usermode=0)
>     at /usr/src/sys/amd64/amd64/trap.c:767
> #13 0xffffffff80d99020 in trap (frame=0xfffffe1838ec1800)
>     at /usr/src/sys/amd64/amd64/trap.c:455
> #14 0xffffffff80d7dd12 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:231
> #15 0xffffffff80d971fb in copyout () at /usr/src/sys/amd64/amd64/support.S:246
> #16 0xffffffff8099bb35 in uiomove_faultflag (cp=<value optimized out>, 
>     n=<value optimized out>, uio=0xfffffe1838ec1ab0, nofault=<value optimized out>)
>     at /usr/src/sys/kern/subr_uio.c:192
> #17 0xffffffff80d8576f in memrw (dev=<value optimized out>, uio=<value optimized out>, 
>     flags=<value optimized out>) at /usr/src/sys/amd64/amd64/mem.c:107
> ---Type <return> to continue, or q <return> to quit---
> #18 0xffffffff808ec764 in giant_read (dev=0xfffff80011347c00, uio=0xfffffe1838ec1ab0, ioflag=0)
>     at /usr/src/sys/kern/kern_conf.c:442
> #19 0xffffffff80817e2b in devfs_read_f (fp=0xfffff80854be3140, uio=0xfffffe1838ec1ab0, 
>     cred=<value optimized out>, flags=0, td=0xfffff801f52c5490)
>     at /usr/src/sys/fs/devfs/devfs_vnops.c:1193
> #20 0xffffffff809a0e25 in dofileread (td=0xfffff801f52c5490, fd=4, fp=0xfffff80854be3140, 
>     auio=0xfffffe1838ec1ab0, offset=<value optimized out>, flags=1172307968) at file.h:299
> #21 0xffffffff809a0b48 in kern_readv (td=0xfffff801f52c5490, fd=4, auio=0xfffffe1838ec1ab0)
>     at /usr/src/sys/kern/sys_generic.c:256
> #22 0xffffffff809a0ad3 in sys_read (td=<value optimized out>, uap=<value optimized out>)
>     at /usr/src/sys/kern/sys_generic.c:171
> #23 0xffffffff80d9a04b in amd64_syscall (td=0xfffff801f52c5490, traced=0) at subr_syscall.c:133
> #24 0xffffffff80d7dffb in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:390
> #25 0x0000000800b8343a in ?? ()
> Previous frame inner to this frame (corrupt stack?)
> Current language:  auto; currently minimal
> (kgdb) quit
> 
> Script done on Sun Mar  9 18:14:59 2014
> 
> Glen
> 

Not sure I can add much here other than to say that redbuild machines
are now running -current as opposed to stable/10.

We are running redbuild01/02 unpatched and 03/04 with patch to compare
stability.  We haven't seen much difference, so either I've screwed up
the patch or the bug report.

sean

Received on Sun Mar 09 2014 - 21:10:23 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:47 UTC