Re: 11.0-CURRENT panic (nfsd?)

From: Markiyan Kushnir <markiyan.kushnir_at_gmail.com>
Date: Sun, 5 Jan 2014 19:47:37 +0200
$ nm /boot/kernel/kernel | grep svc_run_internal
ffffffff80714db0 t svc_run_internal
$ addr2line -e /boot/kernel/kernel 0xffffffff80715779
/usr/src.svnup/sys/rpc/svc.c:971

   949  static void
   950  svc_executereq(struct svc_req *rqstp)
   951  {
   952          SVCXPRT *xprt = rqstp->rq_xprt;
   953          SVCPOOL *pool = xprt->xp_pool;
   954          int prog_found;
   955          rpcvers_t low_vers;
   956          rpcvers_t high_vers;
   957          struct svc_callout *s;
   958
   959          /* now match message with a registered service*/
   960          prog_found = FALSE;
   961          low_vers = (rpcvers_t) -1L;
   962          high_vers = (rpcvers_t) 0L;
   963          TAILQ_FOREACH(s, &pool->sp_callouts, sc_link) {
   964                  if (s->sc_prog == rqstp->rq_prog) {
   965                          if (s->sc_vers == rqstp->rq_vers) {
   966                                  /*
   967                                   * We hand ownership of r to the
   968                                   * dispatch method - they must call
   969                                   * svc_freereq.
   970                                   */
   971                                  (*s->sc_dispatch)(rqstp, xprt);
   972                                  return;
   973                          }  /* found correct version */
   974                          prog_found = TRUE;
   975                          if (s->sc_vers < low_vers)
   976                                  low_vers = s->sc_vers;
   977                          if (s->sc_vers > high_vers)
   978                                  high_vers = s->sc_vers;
   979                  }   /* found correct program */
   980          }
   981
   982          /*
   983           * if we got here, the program or version
   984           * is not served ...
   985           */
   986          if (prog_found)
   987                  svcerr_progvers(rqstp, low_vers, high_vers);
   988          else
   989                  svcerr_noprog(rqstp);
   990
   991          svc_freereq(rqstp);
   992  }
   993

2014/1/5 John-Mark Gurney <jmg_at_funkthat.com>:
> Markiyan Kushnir wrote this message on Sun, Jan 05, 2014 at 11:06 +0200:
>> 2014/1/5 John-Mark Gurney <jmg_at_funkthat.com>:
>> > Markiyan Kushnir wrote this message on Sun, Jan 05, 2014 at 10:57 +0200:
>> >> I started to see a reliable panic on a recent CURRENT:
>> >>
>> >> $ uname -a
>> >> FreeBSD mkushnir.mooo.com 11.0-CURRENT FreeBSD 11.0-CURRENT #0
>> >> r260296: Sun Jan  5 07:14:50 EET 2014
>> >> root_at_vm.mkushnir.mooo.com:/usr/obj/usr/src.svnup/sys/MAREK  amd64
>> >>
>> >> The panic is always triggered by the first request to the nfs service
>> >> (this machine runs a PXE server).
>> >>
>> >> The core.txt is attached. Please let me know if I can help more.
>> >
>> > Apparently the mime-type on the attachment was bad and got scrubbed...
>> >
>> > Maybe include it inline if it isn't too long?
>> >
>>
>> It's 144KB long. I will share it via Google Drive:
>>
>> https://drive.google.com/file/d/0B9Q-zpUXxqCnNVhBY0M5ZzU4d1k/edit?usp=sharing
>
> Looks like a NULL function pointer was called:
> Fatal trap 12: page fault while in kernel mode
> cpuid = 0; apic id = 00
> fault virtual address   = 0x0
> fault code              = supervisor read instruction, page not present
> instruction pointer     = 0x20:0x0
> stack pointer           = 0x28:0xfffffe00d9a2bea0
> frame pointer           = 0x28:0xfffffe00d9a2c010
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                         = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process         = 1323 (nfsd: master)
> trap number             = 12
> panic: page fault
>
> --- trap 0xc, rip = 0, rsp = 0xfffffe00d9a2bea0, rbp = 0xfffffe00d9a2c010 ---
> uart_sab82532_class() at 0/frame 0xfffffe00d9a2c010
> svc_run_internal() at svc_run_internal+0x9c9/frame 0xfffffe00d9a2c1b0
> svc_run() at svc_run+0xed/frame 0xfffffe00d9a2c1f0
> nfsrvd_nfsd() at nfsrvd_nfsd+0x19a/frame 0xfffffe00d9a2c350
> nfssvc_nfsd() at nfssvc_nfsd+0x11a/frame 0xfffffe00d9a2c970
> sys_nfssvc() at sys_nfssvc+0xd2/frame 0xfffffe00d9a2c9a0
> amd64_syscall() at amd64_syscall+0x265/frame 0xfffffe00d9a2cab0
> Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe00d9a2cab0
> --- syscall (155, FreeBSD ELF64, sys_nfssvc), rip = 0x80088c13a, rsp = 0x7fffffffd438, rbp = 0x7fffffffd6e0 ---
>
> The uart_sab82532_class is just the closest symbol to 0, so it's in
> svc_run_internal that's the problem...  Could you run:
> nm /boot/kernel/kernel | grep svc_run_internal
>
> This should return a line w/ a large hex number at the front, then run:
> addr2line -e /boot/kernel/kernel $( expr 0x<largehexnumber>+0x9c9)
>
> This will give you a file name and line number, and can you copy/paste
> the lines around and including that line number?  This will help make
> sure we get the correct code...
>
> Thanks.
>
> --
>   John-Mark Gurney                              Voice: +1 415 225 5579
>
>      "All that I will do, has been done, All that I have, has not."
Received on Sun Jan 05 2014 - 16:47:39 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:46 UTC