Re: Kernel crashes from turnstile_broadcast (/usr/src/sys/kern/subr_turnstile.c:837)

From: Konstantin Belousov <kostikbel_at_gmail.com>
Date: Thu, 22 Sep 2016 16:01:14 +0300
On Thu, Sep 22, 2016 at 08:48:29PM +0800, Ben Woods wrote:
> Hi everyone,
> 
> I am currently experiencing semi-regular kernel crashes on my FreeBSD
> 12-current machine. I am new to kernel debugging, and hoping someone can
> have a look at the debugging output below to point me in the direction of
> what the problem might be.
> 
> My machine is a FreeNAS-mini from iXsystems which I have formatted and
> installed stock FreeBSD onto. My kernel is the default generic-nodebug with
> the VIMAGE options added.
> 
> $ uname -a
> FreeBSD freenas.woods.am 12.0-CURRENT FreeBSD 12.0-CURRENT #0 r305311M: Sat
> Sep  3 12:29:01 AWST 2016
> woodsb02_at_freenas.woods.am:/usr/obj/usr/src/sys/GENERIC-NODEBUG-VIMAGE
> amd64
> 
> $ cat /usr/src/sys/amd64/conf/GENERIC-NODEBUG-VIMAGE
> # SPARTICUS -- WITNESS and INVARIANTS free kernel configuration file
> #               for FreeBSD/amd64
> 
> include GENERIC-NODEBUG
> 
> ident   GENERIC-NODEBUG-VIMAGE
> 
> #nooptions      SCTP    # Stream Control Transmission Protocol
> options         VIMAGE  # VNET/Vimage support
> 
> 
> Output from kernel crash dump debug with kgdb below:
> 
> /usr/obj/usr/src/sys/GENERIC-NODEBUG-VIMAGE)# kgdb kernel.debug
> /var/crash/vmcore.last
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "amd64-marcel-freebsd"...
> 
> Unread portion of the kernel message buffer:
> kernel trap 12 with interrupts disabled
> 
> 
> Fatal trap 12: page fault while in kernel mode
> cpuid = 2; apic id = 04
> fault virtual address   = 0x30
> fault code              = supervisor read data, page not present
> instruction pointer     = 0x20:0xffffffff80b4d91c
> stack pointer           = 0x28:0xfffffe046813a440
> frame pointer           = 0x28:0xfffffe046813a470
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                         = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags        = resume, IOPL = 0
> current process         = 33487 (sh)
> Uptime: 15m16s
> Dumping 1664 out of 16338
> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
> 
> Reading symbols from /boot/kernel.GENERIC-NODEBUG-VIMAGE/zfs.ko...Reading
> symbols from
> /usr/lib/debug//boot/kernel.GENERIC-NODEBUG-VIMAGE/zfs.ko.debug...done.
> done.
> Loaded symbols for /boot/kernel.GENERIC-NODEBUG-VIMAGE/zfs.ko
> Reading symbols from
> /boot/kernel.GENERIC-NODEBUG-VIMAGE/opensolaris.ko...Reading symbols from
> /usr/lib/debug//boot/kernel.GENERIC-NODEBUG-VIMAGE/opensolaris.ko.debug...
> done.
> done.
> Loaded symbols for /boot/kernel.GENERIC-NODEBUG-VIMAGE/opensolaris.ko
> Reading symbols from
> /boot/kernel.GENERIC-NODEBUG-VIMAGE/geom_eli.ko...Reading symbols from
> /usr/lib/debug//boot/kernel.GENERIC-NODEBUG-VIMAGE/geom_eli.ko.debug...done.
> done.
> Loaded symbols for /boot/kernel.GENERIC-NODEBUG-VIMAGE/geom_eli.ko
> Reading symbols from
> /boot/kernel.GENERIC-NODEBUG-VIMAGE/accf_http.ko...Reading symbols from
> /usr/lib/debug//boot/kernel.GENERIC-NODEBUG-VIMAGE/accf_http.ko.debug...done.
> done.
> Loaded symbols for /boot/kernel.GENERIC-NODEBUG-VIMAGE/accf_http.ko
> Reading symbols from
> /boot/kernel.GENERIC-NODEBUG-VIMAGE/coretemp.ko...Reading symbols from
> /usr/lib/debug//boot/kernel.GENERIC-NODEBUG-VIMAGE/coretemp.ko.debug...done.
> done.
> Loaded symbols for /boot/kernel.GENERIC-NODEBUG-VIMAGE/coretemp.ko
> Reading symbols from /boot/kernel.GENERIC-NODEBUG-VIMAGE/aesni.ko...Reading
> symbols from
> /usr/lib/debug//boot/kernel.GENERIC-NODEBUG-VIMAGE/aesni.ko.debug...done.
> done.
> Loaded symbols for /boot/kernel.GENERIC-NODEBUG-VIMAGE/aesni.ko
> Reading symbols from
> /boot/kernel.GENERIC-NODEBUG-VIMAGE/if_bridge.ko...Reading symbols from
> /usr/lib/debug//boot/kernel.GENERIC-NODEBUG-VIMAGE/if_bridge.ko.debug...done.
> done.
> Loaded symbols for /boot/kernel.GENERIC-NODEBUG-VIMAGE/if_bridge.ko
> Reading symbols from
> /boot/kernel.GENERIC-NODEBUG-VIMAGE/bridgestp.ko...Reading symbols from
> /usr/lib/debug//boot/kernel.GENERIC-NODEBUG-VIMAGE/bridgestp.ko.debug...done.
> done.
> Loaded symbols for /boot/kernel.GENERIC-NODEBUG-VIMAGE/bridgestp.ko
> Reading symbols from /boot/kernel.GENERIC-NODEBUG-VIMAGE/ums.ko...Reading
> symbols from
> /usr/lib/debug//boot/kernel.GENERIC-NODEBUG-VIMAGE/ums.ko.debug...done.
> done.
> Loaded symbols for /boot/kernel.GENERIC-NODEBUG-VIMAGE/ums.ko
> Reading symbols from
> /boot/kernel.GENERIC-NODEBUG-VIMAGE/netgraph.ko...Reading symbols from
> /usr/lib/debug//boot/kernel.GENERIC-NODEBUG-VIMAGE/netgraph.ko.debug...done.
> done.
> Loaded symbols for /boot/kernel.GENERIC-NODEBUG-VIMAGE/netgraph.ko
> Reading symbols from
> /boot/kernel.GENERIC-NODEBUG-VIMAGE/ng_netflow.ko...Reading symbols from
> /usr/lib/debug//boot/kernel.GENERIC-NODEBUG-VIMAGE/ng_netflow.ko.debug...done.
> done.
> Loaded symbols for /boot/kernel.GENERIC-NODEBUG-VIMAGE/ng_netflow.ko
> Reading symbols from
> /boot/kernel.GENERIC-NODEBUG-VIMAGE/ng_ksocket.ko...Reading symbols from
> /usr/lib/debug//boot/kernel.GENERIC-NODEBUG-VIMAGE/ng_ksocket.ko.debug...done.
> done.
> Loaded symbols for /boot/kernel.GENERIC-NODEBUG-VIMAGE/ng_ksocket.ko
> Reading symbols from
> /boot/kernel.GENERIC-NODEBUG-VIMAGE/ng_ether.ko...Reading symbols from
> /usr/lib/debug//boot/kernel.GENERIC-NODEBUG-VIMAGE/ng_ether.ko.debug...done.
> done.
> Loaded symbols for /boot/kernel.GENERIC-NODEBUG-VIMAGE/ng_ether.ko
> Reading symbols from
> /boot/kernel.GENERIC-NODEBUG-VIMAGE/ng_socket.ko...Reading symbols from
> /usr/lib/debug//boot/kernel.GENERIC-NODEBUG-VIMAGE/ng_socket.ko.debug...done.
> done.
> Loaded symbols for /boot/kernel.GENERIC-NODEBUG-VIMAGE/ng_socket.ko
> Reading symbols from /boot/kernel.GENERIC-NODEBUG-VIMAGE/linux.ko...Reading
> symbols from
> /usr/lib/debug//boot/kernel.GENERIC-NODEBUG-VIMAGE/linux.ko.debug...done.
> done.
> Loaded symbols for /boot/kernel.GENERIC-NODEBUG-VIMAGE/linux.ko
> Reading symbols from
> /boot/kernel.GENERIC-NODEBUG-VIMAGE/linux_common.ko...Reading symbols from
> /usr/lib/debug//boot/kernel.GENERIC-NODEBUG-VIMAGE/linux_common.ko.debug...done.
> done.
> Loaded symbols for /boot/kernel.GENERIC-NODEBUG-VIMAGE/linux_common.ko
> Reading symbols from
> /boot/kernel.GENERIC-NODEBUG-VIMAGE/linux64.ko...Reading symbols from
> /usr/lib/debug//boot/kernel.GENERIC-NODEBUG-VIMAGE/linux64.ko.debug...done.
> done.
> Loaded symbols for /boot/kernel.GENERIC-NODEBUG-VIMAGE/linux64.ko
> Reading symbols from
> /boot/kernel.GENERIC-NODEBUG-VIMAGE/fdescfs.ko...Reading symbols from
> /usr/lib/debug//boot/kernel.GENERIC-NODEBUG-VIMAGE/fdescfs.ko.debug...done.
> done.
> Loaded symbols for /boot/kernel.GENERIC-NODEBUG-VIMAGE/fdescfs.ko
> Reading symbols from
> /boot/kernel.GENERIC-NODEBUG-VIMAGE/if_epair.ko...Reading symbols from
> /usr/lib/debug//boot/kernel.GENERIC-NODEBUG-VIMAGE/if_epair.ko.debug...done.
> done.
> Loaded symbols for /boot/kernel.GENERIC-NODEBUG-VIMAGE/if_epair.ko
> Reading symbols from
> /boot/kernel.GENERIC-NODEBUG-VIMAGE/nullfs.ko...Reading symbols from
> /usr/lib/debug//boot/kernel.GENERIC-NODEBUG-VIMAGE/nullfs.ko.debug...done.
> done.
> Loaded symbols for /boot/kernel.GENERIC-NODEBUG-VIMAGE/nullfs.ko
> Reading symbols from /boot/kernel.GENERIC-NODEBUG-VIMAGE/tmpfs.ko...Reading
> symbols from
> /usr/lib/debug//boot/kernel.GENERIC-NODEBUG-VIMAGE/tmpfs.ko.debug...done.
> done.
> Loaded symbols for /boot/kernel.GENERIC-NODEBUG-VIMAGE/tmpfs.ko
> #0  doadump (textdump=1) at pcpu.h:221
> 221             __asm("movq %%gs:%1,%0" : "=r" (td)
> (kgdb) list *0xffffffff80b4d91c
> 0xffffffff80b4d91c is in turnstile_broadcast
> (/usr/src/sys/kern/subr_turnstile.c:837).
> 832
> 833             /*
> 834              * Transfer the blocked list to the pending list.
> 835              */
> 836             mtx_lock_spin(&td_contested_lock);
> 837             TAILQ_CONCAT(&ts->ts_pending, &ts->ts_blocked[queue],
> td_lockq);
> 838             mtx_unlock_spin(&td_contested_lock);
> 839
> 840             /*
> 841              * Give a turnstile to each thread.  The last thread gets
> Current language:  auto; currently minimal
> (kgdb) backtrace
> #0  doadump (textdump=1) at pcpu.h:221
> #1  0xffffffff80aea40e in kern_reboot (howto=260) at
> /usr/src/sys/kern/kern_shutdown.c:366
> #2  0xffffffff80aea9db in vpanic (fmt=<value optimized out>, ap=<value
> optimized out>) at /usr/src/sys/kern/kern_shutdown.c:759
> #3  0xffffffff80aea813 in panic (fmt=0x0) at
> /usr/src/sys/kern/kern_shutdown.c:690
> #4  0xffffffff8039e197 in db_panic (addr=<value optimized out>,
> have_addr=false, count=0, modif=0x0) at /usr/src/sys/ddb/db_command.c:486
> #5  0xffffffff8039d689 in db_command (cmd_table=<value optimized out>) at
> /usr/src/sys/ddb/db_command.c:453
> #6  0xffffffff8039d3e4 in db_command_loop () at
> /usr/src/sys/ddb/db_command.c:506
> #7  0xffffffff803a053b in db_trap (type=<value optimized out>, code=<value
> optimized out>) at /usr/src/sys/ddb/db_main.c:251
> #8  0xffffffff80b36b33 in kdb_trap (type=<value optimized out>, code=<value
> optimized out>, tf=<value optimized out>) at
> /usr/src/sys/kern/subr_kdb.c:654
> #9  0xffffffff80fdd441 in trap_fatal (frame=0xfffffe046813a390, eva=48) at
> /usr/src/sys/amd64/amd64/trap.c:836
> #10 0xffffffff80fdd673 in trap_pfault (frame=0xfffffe046813a390,
> usermode=0) at /usr/src/sys/amd64/amd64/trap.c:691
> #11 0xffffffff80fdcbfc in trap (frame=0xfffffe046813a390) at
> /usr/src/sys/amd64/amd64/trap.c:442
> #12 0xffffffff80fbf491 in calltrap () at
> /usr/src/sys/amd64/amd64/exception.S:236
> #13 0xffffffff80b4d91c in turnstile_broadcast (ts=0x0, queue=1) at
> /usr/src/sys/kern/subr_turnstile.c:837
> #14 0xffffffff80ae5e1f in __rw_wunlock_hard (c=0xfffff803f886d960,
> tid=<value optimized out>, file=<value optimized out>, line=<value
> optimized out>)
>     at /usr/src/sys/kern/kern_rwlock.c:1027
> #15 0xffffffff80e525dc in vm_map_delete (map=<value optimized out>,
> start=<value optimized out>, end=<value optimized out>) at
> /usr/src/sys/vm/vm_map.c:2960
> #16 0xffffffff80e54477 in vm_map_remove (map=0xfffff8035540f000,
> start=140737488355328, end=1) at /usr/src/sys/vm/vm_map.c:3077
> #17 0xffffffff80a9863f in exec_new_vmspace (imgp=0xfffffe046813a860,
> sv=0xffffffff81a596e8) at /usr/src/sys/kern/kern_exec.c:1096
> #18 0xffffffff80a6ced8 in exec_elf64_imgact (imgp=<value optimized out>) at
> /usr/src/sys/kern/imgact_elf.c:896
> #19 0xffffffff80a9670d in kern_execve (td=<value optimized out>,
> args=<value optimized out>, mac_p=0x0) at /usr/src/sys/kern/kern_exec.c:603
> #20 0xffffffff80a95b9c in sys_execve (td=0xfffff8032893aa00,
> uap=0xfffffe046813ab80) at /usr/src/sys/kern/kern_exec.c:219
> #21 0xffffffff80fddde8 in amd64_syscall (td=<value optimized out>,
> traced=0) at subr_syscall.c:135
> #22 0xffffffff80fbf77b in Xfast_syscall () at
> /usr/src/sys/amd64/amd64/exception.S:396
> #23 0x0000000800b468ea in ?? ()
> Previous frame inner to this frame (corrupt stack?)
> (kgdb) up 11
> #11 0xffffffff80fdcbfc in trap (frame=0xfffffe046813a390) at
> /usr/src/sys/amd64/amd64/trap.c:442
> 442                             (void) trap_pfault(frame, FALSE);
> (kgdb) list
> 437
> 438                     KASSERT(cold || td->td_ucred != NULL,
> 439                         ("kernel trap doesn't have ucred"));
> 440                     switch (type) {
> 441                     case T_PAGEFLT:                 /* page fault */
> 442                             (void) trap_pfault(frame, FALSE);
> 443                             goto out;
> 444
> 445                     case T_DNA:
> 446                             if (PCB_USER_FPU(td->td_pcb))
> (kgdb) print td
> $1 = (struct thread *) 0xfffff8032893aa00
> (kgdb) print td->td_ucred
> $2 = (struct ucred *) 0xfffff8004005ec00
> (kgdb) print type
> $3 = 12
> (kgdb) up
> #12 0xffffffff80fbf491 in calltrap () at
> /usr/src/sys/amd64/amd64/exception.S:236
> 236             call    trap_check
> Current language:  auto; currently asm
> (kgdb) up
> #13 0xffffffff80b4d91c in turnstile_broadcast (ts=0x0, queue=1) at
> /usr/src/sys/kern/subr_turnstile.c:837
> 837             TAILQ_CONCAT(&ts->ts_pending, &ts->ts_blocked[queue],
> td_lockq);
ts == NULL usually means that the unlock was performed on unlocked rwlock.

Add witness and all other usual stuff to your kernel config, this
should give more useful information earlier.  Still, I do not see how an
object can become unlocked at the line 2690.

> Current language:  auto; currently minimal
> (kgdb) up
> #14 0xffffffff80ae5e1f in __rw_wunlock_hard (c=0xfffff803f886d960,
> tid=<value optimized out>, file=<value optimized out>, line=<value
> optimized out>)
>     at /usr/src/sys/kern/kern_rwlock.c:1027
> 1027            turnstile_broadcast(ts, queue);
> (kgdb) up
> #15 0xffffffff80e525dc in vm_map_delete (map=<value optimized out>,
> start=<value optimized out>, end=<value optimized out>) at
> /usr/src/sys/vm/vm_map.c:2960
> 2960                    VM_OBJECT_WUNLOCK(object);
> (kgdb) up
> #16 0xffffffff80e54477 in vm_map_remove (map=0xfffff8035540f000,
> start=140737488355328, end=1) at /usr/src/sys/vm/vm_map.c:3077
> 3077            result = vm_map_delete(map, start, end);
> (kgdb) up
> #17 0xffffffff80a9863f in exec_new_vmspace (imgp=0xfffffe046813a860,
> sv=0xffffffff81a596e8) at /usr/src/sys/kern/kern_exec.c:1096
> 1096                    vm_map_remove(map, vm_map_min(map),
> vm_map_max(map));
> (kgdb) up
> #18 0xffffffff80a6ced8 in exec_elf64_imgact (imgp=<value optimized out>) at
> /usr/src/sys/kern/imgact_elf.c:896
> 896             error = exec_new_vmspace(imgp, sv);
> (kgdb) up
> #19 0xffffffff80a9670d in kern_execve (td=<value optimized out>,
> args=<value optimized out>, mac_p=0x0) at /usr/src/sys/kern/kern_exec.c:603
> 603                     error = (*execsw[i]->ex_imgact)(imgp);
> (kgdb) up
> #20 0xffffffff80a95b9c in sys_execve (td=0xfffff8032893aa00,
> uap=0xfffffe046813ab80) at /usr/src/sys/kern/kern_exec.c:219
> 219                     error = kern_execve(td, &args, NULL);
> (kgdb) up
> #21 0xffffffff80fddde8 in amd64_syscall (td=<value optimized out>,
> traced=0) at subr_syscall.c:135
> 135                     error = (sa->callp->sy_call)(td, sa->args);
> (kgdb) up
> #22 0xffffffff80fbf77b in Xfast_syscall () at
> /usr/src/sys/amd64/amd64/exception.S:396
> 396             call    amd64_syscall
> Current language:  auto; currently asm
> (kgdb) up
> #23 0x0000000800b468ea in ?? ()
> (kgdb) up
> Initial frame selected; you cannot go up.
> (kgdb) quit
> 
> 
> 
> Regards,
> Ben
> 
> --
> From: Benjamin Woods
> woodsb02_at_gmail.com
> _______________________________________________
> freebsd-current_at_freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"
Received on Thu Sep 22 2016 - 11:01:24 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:08 UTC