Re: panic after ifioctl/if_clone_destroy

From: Matthew Macy <mmacy_at_freebsd.org>
Date: Sun, 5 Aug 2018 13:01:00 -0700
If you could give me a self-contained reproducer that would expedite a fix.

Thanks.
-M

On Sun, Aug 5, 2018 at 08:36 Roman Bogorodskiy <novel_at_freebsd.org> wrote:

> Running -CURRENT r336863 on amd64. Get the following panic right after
> (or during) boot:
>
> Fatal trap 12: page fault while in kernel mode
> cpuid = 2; apic id = 04
> fault virtual address   = 0xdeadc2ff
> fault code              = supervisor read data, page not present
> instruction pointer     = 0x20:0xffffffff80bd7858
> stack pointer           = 0x28:0xfffffe008b445580
> frame pointer           = 0x28:0xfffffe008b4455c0
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                         = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process         = 903 (libvirtd)
>
> Traceback is:
>
> (kgdb) #0  doadump (textdump=0) at pcpu.h:230
> #1  0xffffffff8043dc7b in db_dump (dummy=<value optimized out>,
>     dummy2=<value optimized out>, dummy3=<value optimized out>,
>     dummy4=<value optimized out>) at /usr/src/sys/ddb/db_command.c:574
> #2  0xffffffff8043da49 in db_command (cmd_table=<value optimized out>)
>     at /usr/src/sys/ddb/db_command.c:481
> #3  0xffffffff8043d7c4 in db_command_loop ()
>     at /usr/src/sys/ddb/db_command.c:534
> #4  0xffffffff804409ef in db_trap (type=<value optimized out>,
>     code=<value optimized out>) at /usr/src/sys/ddb/db_main.c:252
> #5  0xffffffff80bdd513 in kdb_trap (type=12, code=0, tf=<value optimized
> out>)
>     at /usr/src/sys/kern/subr_kdb.c:693
> #6  0xffffffff810769f1 in trap_fatal (frame=0xfffffe008b4454c0,
> eva=3735929599)
>     at /usr/src/sys/amd64/amd64/trap.c:884
> #7  0xffffffff81076b12 in trap_pfault (frame=0xfffffe008b4454c0,
>     usermode=<value optimized out>) at pcpu.h:230
> #8  0xffffffff8107611a in trap (frame=0xfffffe008b4454c0)
>     at /usr/src/sys/amd64/amd64/trap.c:427
> #9  0xffffffff810518ac in calltrap ()
>     at /usr/src/sys/amd64/amd64/exception.S:230
> #10 0xffffffff80bd7858 in epoch_block_handler_preempt (
>     global=<value optimized out>, cr=0xfffffe00760c3a00,
>     arg=<value optimized out>) at /usr/src/sys/kern/subr_epoch.c:256
> #11 0xffffffff803994fd in ck_epoch_synchronize_wait (
>     global=0xfffff800030c5680,
>     cb=0xffffffff80bd77a0 <epoch_block_handler_preempt>, ct=0x0)
>     at /usr/src/sys/contrib/ck/src/ck_epoch.c:407
> #12 0xffffffff80bd7630 in epoch_wait_preempt (epoch=0xfffff800030c5680)
>     at /usr/src/sys/kern/subr_epoch.c:389
> #13 0xffffffff80c983bf in if_delgroup (ifp=0xfffff80003aab800,
>     groupname=0xfffff80005ff5e00 "bridge") at /usr/src/sys/net/if.c:1514
> #14 0xffffffff80c9f2b2 in if_clone_destroyif (ifc=0xfffff80005ff5e00,
>     ifp=0xfffff80003aab800) at /usr/src/sys/net/if_clone.c:325
> #15 0xffffffff80c9f0d5 in if_clone_destroy (name=0xfffffe008b4458d0
> "virbr0")
>     at /usr/src/sys/net/if_clone.c:288
> #16 0xffffffff80c9a2c3 in ifioctl (so=0xfffff80007edca38, cmd=2149607801,
>     data=<value optimized out>, td=<value optimized out>)
>     at /usr/src/sys/net/if.c:3053
> #17 0xffffffff80c04259 in kern_ioctl (td=0xfffff80007c1a580,
>     fd=<value optimized out>, com=<value optimized out>,
>     data=<value optimized out>) at file.h:330
> #18 0xffffffff80c03f2e in sys_ioctl (td=0xfffff80007c1a580,
>     uap=0xfffff80007c1a940) at /usr/src/sys/kern/sys_generic.c:712
> #19 0xffffffff81077401 in amd64_syscall (td=0xfffff80007c1a580, traced=0)
>     at subr_syscall.c:135
> #20 0xffffffff8105218d in fast_syscall_common ()
>     at /usr/src/sys/amd64/amd64/exception.S:500
> #21 0x00000008028f4c0a in ?? ()
>
>
> Previous frame inner to this frame (corrupt stack?)
>
>
> Current language:  auto; currently minimal
>
>
> (kgdb)
>
> It looks like panic happens during network interfaces related
> operations. Couple of dmesg lines before panic:
>
> Aug  5 19:02:42 romashka rtsold[585]: <rtsock_input_ifannounce> interface
> bridge0 removed
> Aug  5 19:02:42 romashka kernel: bridge0: Ethernet address:
> 02:af:41:48:c7:00
> Aug  5 19:02:42 romashka kernel: bridge0: changing name to 'virbr-ab'
> Aug  5 19:02:42 romashka kernel: tap0: Ethernet address: 00:bd:8d:11:f7:00
> Aug  5 19:02:42 romashka kernel: tap0: link state changed to UP
> Aug  5 19:02:42 romashka kernel: tap0: changing name to 'virbr-ab-nic'
> Aug  5 19:02:42 romashka kernel: virbr-ab-nic: promiscuous mode enabled
> Aug  5 19:02:42 romashka kernel: virbr-ab: link state changed to UP
> Aug  5 19:02:42 romashka rtsold[585]: <rtsock_input_ifannounce> interface
> tap0 removed
> Aug  5 19:02:43 romashka dnsmasq[1047]: setting --bind-interfaces option
> because of OS limitations
> Aug  5 19:02:43 romashka dnsmasq[1047]: warning: no upstream servers
> configured
> Aug  5 19:02:43 romashka kernel: virbr-ab-nic: link state changed to DOWN
> Aug  5 19:02:43 romashka kernel: virbr-ab: link state changed to DOWN
> Aug  5 19:02:43 romashka kernel: bridge1: Ethernet address:
> 02:af:41:48:c7:01
> Aug  5 19:02:43 romashka kernel: bridge1: changing name to 'virbr0'
> Aug  5 19:02:43 romashka rtsold[585]: <rtsock_input_ifannounce> interface
> bridge1 removed
> Aug  5 19:02:43 romashka kernel: tap1: Ethernet address: 00:bd:53:14:f7:01
> Aug  5 19:02:43 romashka kernel: tap1: link state changed to UP
> Aug  5 19:02:43 romashka kernel: tap1: changing name to 'virbr0-nic'
> Aug  5 19:02:43 romashka kernel: virbr0: link state changed to UP
> Aug  5 19:02:43 romashka kernel: virbr0-nic: promiscuous mode enabled
> Aug  5 19:02:43 romashka rtsold[585]: <rtsock_input_ifannounce> interface
> tap1 removed
> Aug  5 19:05:03 romashka syslogd: kernel boot file is /boot/kernel/kernel
> Aug  5 19:05:03 romashka kernel:
> Aug  5 19:05:03 romashka syslogd: last message repeated 1 times
> Aug  5 19:05:03 romashka kernel: Fatal trap 12: page fault while in kernel
> mode
>
> If I disable libvirt service, system completes booting fine. What it
> tries to do on start, it creates a couple of bridge(4) and tap(4)
> devices, adds tap devices to bridges it created, and possibly destroy
> these interfaces in case of errors. It also starts dnsmasq on some of
> these interfaces.
>
> This problem started to appear about 2-4 weeks ago.
>
> Roman Bogorodskiy
>
Received on Sun Aug 05 2018 - 18:01:13 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:17 UTC