Re: 11.0-CURRENT panic (nfsd?)

From: John-Mark Gurney <jmg_at_funkthat.com>
Date: Sun, 5 Jan 2014 09:35:41 -0800
Markiyan Kushnir wrote this message on Sun, Jan 05, 2014 at 11:06 +0200:
> 2014/1/5 John-Mark Gurney <jmg_at_funkthat.com>:
> > Markiyan Kushnir wrote this message on Sun, Jan 05, 2014 at 10:57 +0200:
> >> I started to see a reliable panic on a recent CURRENT:
> >>
> >> $ uname -a
> >> FreeBSD mkushnir.mooo.com 11.0-CURRENT FreeBSD 11.0-CURRENT #0
> >> r260296: Sun Jan  5 07:14:50 EET 2014
> >> root_at_vm.mkushnir.mooo.com:/usr/obj/usr/src.svnup/sys/MAREK  amd64
> >>
> >> The panic is always triggered by the first request to the nfs service
> >> (this machine runs a PXE server).
> >>
> >> The core.txt is attached. Please let me know if I can help more.
> >
> > Apparently the mime-type on the attachment was bad and got scrubbed...
> >
> > Maybe include it inline if it isn't too long?
> >
> 
> It's 144KB long. I will share it via Google Drive:
> 
> https://drive.google.com/file/d/0B9Q-zpUXxqCnNVhBY0M5ZzU4d1k/edit?usp=sharing

Looks like a NULL function pointer was called:
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x0
fault code              = supervisor read instruction, page not present
instruction pointer     = 0x20:0x0
stack pointer           = 0x28:0xfffffe00d9a2bea0
frame pointer           = 0x28:0xfffffe00d9a2c010
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 1323 (nfsd: master)
trap number             = 12
panic: page fault

--- trap 0xc, rip = 0, rsp = 0xfffffe00d9a2bea0, rbp = 0xfffffe00d9a2c010 ---
uart_sab82532_class() at 0/frame 0xfffffe00d9a2c010
svc_run_internal() at svc_run_internal+0x9c9/frame 0xfffffe00d9a2c1b0
svc_run() at svc_run+0xed/frame 0xfffffe00d9a2c1f0
nfsrvd_nfsd() at nfsrvd_nfsd+0x19a/frame 0xfffffe00d9a2c350
nfssvc_nfsd() at nfssvc_nfsd+0x11a/frame 0xfffffe00d9a2c970
sys_nfssvc() at sys_nfssvc+0xd2/frame 0xfffffe00d9a2c9a0
amd64_syscall() at amd64_syscall+0x265/frame 0xfffffe00d9a2cab0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe00d9a2cab0
--- syscall (155, FreeBSD ELF64, sys_nfssvc), rip = 0x80088c13a, rsp = 0x7fffffffd438, rbp = 0x7fffffffd6e0 ---

The uart_sab82532_class is just the closest symbol to 0, so it's in
svc_run_internal that's the problem...  Could you run:
nm /boot/kernel/kernel | grep svc_run_internal

This should return a line w/ a large hex number at the front, then run:
addr2line -e /boot/kernel/kernel $( expr 0x<largehexnumber>+0x9c9)

This will give you a file name and line number, and can you copy/paste
the lines around and including that line number?  This will help make
sure we get the correct code...

Thanks.

-- 
  John-Mark Gurney				Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."
Received on Sun Jan 05 2014 - 16:35:42 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:46 UTC