Re: LOR + panic in scope6.c

From: John Baldwin <jhb_at_FreeBSD.org>
Date: Tue, 9 Aug 2005 16:57:10 -0400
On Tuesday 09 August 2005 02:46 pm, Bjoern A. Zeeb wrote:
> On Tue, 9 Aug 2005, Bjoern A. Zeeb wrote:
> > On Tue, 9 Aug 2005, John Baldwin wrote:
> > > On Tuesday 09 August 2005 07:40 am, Bjoern A. Zeeb wrote:
> > > > Hi,
> > > >
> > > > HEAD as of yesterday + rwatson mega-commit from today.
> > > >
> > > > lock order reversal
> > > >  1st 0xffffff0000ad6bf0 if_afdata (if_afdata) _at_
> > > > sys/netinet6/scope6.c:415 2nd 0xffffffff8081dd30 user map (user map)
> > > > _at_ sys/vm/vm_map.c:2997 KDB: stack backtrace:
> > > >
> > > > --- trap 0xc, rip = 0xffffffff804990a0, rsp = 0xffffffff809dc3f0, rbp
> > > > = 0xffffffff809dc430 --- in6_setscope() at in6_setscope+0x50
> > > > in6_ifdetach() at in6_ifdetach+0x24a
> > > > if_detach() at if_detach+0x39
> > > > ether_ifdetach() at ether_ifdetach+0x35
> > > > sk_attach() at sk_attach+0x51a
> > > >
> > > > Fatal trap 12: page fault while in kernel mode
> > > > fault virtual address   = 0x18
> > > > fault code              = supervisor read, page not present
> > > > instruction pointer     = 0x8:0xffffffff804990a0
> > > > stack pointer           = 0x10:0xffffffff809dc3f0
> > > > frame pointer           = 0x10:0xffffffff809dc430
> > > > code segment            = base 0x0, limit 0xfffff, type 0x1b
> > > >                         = DPL 0, pres 1, long 1, def32 0, gran 1
> > > > processor eflags        = interrupt enabled, resume, IOPL = 0
> > > > current process         = 0 (swapper)
> > > > [thread pid 0 tid 0 ]
> > > > Stopped at      in6_setscope+0x50:      movq    0x18(%rax),%r13
> > > >
> > > > (gdb) l *0xffffffff804990a0
> > > > 0xffffffff804990a0 is in in6_setscope (sys/netinet6/scope6.c:417).
> > > > 412             u_int32_t zoneid = 0;
> > > > 413             struct scope6_id *sid;
> > > > 414
> > > > 415             IF_AFDATA_LOCK(ifp);
> > > > 416
> > > > 417             sid = SID(ifp);
> > > > 418
> > > > 419     #ifdef DIAGNOSTIC
> > > > 420             if (sid == NULL) { /* should not happen */
> > > > 421                     panic("in6_setscope: scope array is NULL");
> > >
> > > Well, SID is a macro that expands this to:
> > >
> > > 	sid = ifp->if_afdata[AF_INET6]->scope6_id
> > >
> > > If if_afdata[AF_INET6] has already been freed that could be the
> > > problem. It might have never been non-null either I guess.  You can try
> > > having in6_setscope() bail if ifp->if_afdata[AF_INET6] is NULL.
> >
> > I will. I think I found another problem with attach/detach in sk.
> > The above seems to happen in the "No PHY found" case (which I fixed
> > already locally and everything went away).
>
> Leaving it to hit the problem it goes like this:
>
> ...
> skc0: no PHY found!
> panic: ifp->if_afdata[AF_INET6] NULL
> KDB: enter: panic
> [thread pid 0 tid 0 ]
> Stopped at      kdb_enter+0x2f: nop
> db> where
> Tracing pid 0 tid 0 td 0xffffffff8081e6c0
> kdb_enter() at kdb_enter+0x2f
> panic() at panic+0x1d2
> in6_setscope() at in6_setscope+0x20f
> in6_ifdetach() at in6_ifdetach+0x24a
> if_detach() at if_detach+0x39
> ether_ifdetach() at ether_ifdetach+0x35
> sk_attach() at sk_attach+0x522
> device_attach() at device_attach+0x292
> bus_generic_attach() at bus_generic_attach+0x18
> skc_attach() at skc_attach+0x6df
> device_attach() at device_attach+0x292
> ...
>
> bz_at_amd64:/local/building/freebsd/HEAD/sys> cvs -qR diff -up
> netinet6/scope6.c Index: netinet6/scope6.c
> ===================================================================
> RCS file: /local/mirror/FreeBSD/r/ncvs/src/sys/netinet6/scope6.c,v
> retrieving revision 1.15
> diff -u -p -r1.15 scope6.c
> --- netinet6/scope6.c   25 Jul 2005 17:28:39 -0000      1.15
> +++ netinet6/scope6.c   9 Aug 2005 17:35:07 -0000
> _at__at_ -412,8 +412,13 _at__at_ in6_setscope(in6, ifp, ret_id)
>         u_int32_t zoneid = 0;
>         struct scope6_id *sid;
>
> +       KASSERT(ifp != NULL, ("ifp NULL"));
> +
>         IF_AFDATA_LOCK(ifp);
>
> +       KASSERT(ifp->if_afdata[AF_INET6] != NULL,
> +               ("ifp->if_afdata[AF_INET6] NULL"));
> +
>         sid = SID(ifp);
>
>  #ifdef DIAGNOSTIC
>
> Could it be a problem of ether_ifattach and ether_ifdetach being
> run without the driver locks?  UP machine btw.

I don't think it is a locking problem.  I think that the inet6 code is simply 
not taking into account some edge case.  In theory I don't think that 
if_afdata[AF_INET6] should be NULL since ether_ifattach() has called inet6's 
domain attach routine.  Are you sure that you have called ether_ifattach() 
btw?

-- 
John Baldwin <jhb_at_FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org
Received on Tue Aug 09 2005 - 19:35:53 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:41 UTC