FreeBSD-HEAD gets stuck on vnode operations

From: Roger Pau Monné <roger.pau_at_citrix.com>
Date: Mon, 13 May 2013 13:18:04 +0200
Hello,

I've set up a FreeBSD-HEAD VM on Xen, and compiled the XENHVM kernel, 
last commit in the repository is:

Date: Tue, 7 May 2013 12:39:14 +0000
Subject: [PATCH] By request, add an arrow from NetBSD-0.8 to FreeBSD-1.0.

While here, add a few more NetBSD versions to the tree itself.

Submitted by:	Alan Barrett <apb_at_cequrux.com>
Submitted by:	Thomas Klausner <wiz_at_netbsd.org>

And I've started stressing the VM using the following tight loop:

while [ 1 ]; do make -j12 buildkernel; done

>From time to time, I see the VM getting stuck and when breaking into 
KDB this is the output of ps:

  pid  ppid  pgrp   uid   state   wmesg         wchan        cmd
32343   670   670    25  L      *Name Cac 0xfffffe0017183780 sendmail
32342 32256  5273     0  L+     *Name Cac 0xfffffe0017183780 sh
32341 32335  5273     0  L+     *Name Cac 0xfffffe0017183780 cc
32340 32284  5273     0  D+      ufs      0xfffffe000dc76068 ctfconvert
32339 32293  5273     0  L+     *Name Cac 0xfffffe0017183780 ctfconvert
32337 32332  5273     0  L+     *vnode_fr 0xfffffe00872c23c0 cc
32335 32334  5273     0  S+      wait     0xfffffe010537d000 cc
32334 30655  5273     0  S+      wait     0xfffffe010ac974b8 sh
32333 32329  5273     0  L+     *vnode in 0xfffffe0017183180 cc
32332 32331  5273     0  S+      wait     0xfffffe00416a14b8 cc
32331 30655  5273     0  S+      wait     0xfffffe00b9846970 sh
32329 32328  5273     0  S+      wait     0xfffffe0049ace000 cc
32328 30655  5273     0  S+      wait     0xfffffe0041670000 sh
32324 32320  5273     0  L+     *Name Cac 0xfffffe0017183780 cc
32320 32318  5273     0  S+      wait     0xfffffe00416184b8 cc
32318 30655  5273     0  S+      wait     0xfffffe00879fd970 sh
32314 32313  5273     0  L+     *Name Cac 0xfffffe0017183780 cc
32313 32311  5273     0  S+      wait     0xfffffe00b9846000 cc
32312 32309  5273     0  L+     *Name Cac 0xfffffe0017183780 cc
32311 30655  5273     0  S+      wait     0xfffffe00415844b8 sh
32310 32305  5273     0  L+     *Name Cac 0xfffffe0017183780 cc
32309 32307  5273     0  S+      wait     0xfffffe00499624b8 cc
32307 30655  5273     0  S+      wait     0xfffffe010537c970 sh
32305 32304  5273     0  S+      wait     0xfffffe0041669000 cc
32304 30655  5273     0  S+      wait     0xfffffe00173d9970 sh
32303 32298  5273     0  L+     *Name Cac 0xfffffe0017183780 cc
32298 32295  5273     0  S+      wait     0xfffffe0049924000 cc
32295 30655  5273     0  S+      wait     0xfffffe0041631970 sh
32293 30655  5273     0  S+      wait     0xfffffe000d15b000 sh
32284 30655  5273     0  S+      wait     0xfffffe00416684b8 sh
32256 31391  5273     0  S+      select   0xfffffe000d965840 make
32022 30655  5273     0  L+     *Name Cac 0xfffffe0017183780 sh
31391 31386  5273     0  S+      wait     0xfffffe0041680970 sh
31386 30664  5273     0  S+      select   0xfffffe0017942cc0 make
30664 30663  5273     0  S+      wait     0xfffffe004169f4b8 sh
30663 30662  5273     0  S+      select   0xfffffe000d1c0d40 make
30662 30655  5273     0  S+      wait     0xfffffe00b9ddb4b8 sh
30655  5287  5273     0  S+      select   0xfffffe000d1c0040 make
 5287  5280  5273     0  S+      wait     0xfffffe004148f970 sh
 5280  5278  5273     0  S+      select   0xfffffe000d964540 make
 5278  5273  5273     0  S+      wait     0xfffffe000de3a970 sh
 5273   736  5273     0  S+      select   0xfffffe000bc81740 make
91658   735 91658     0  Ss+     ttyin    0xfffffe000d6caca8 bash
  736   735   736     0  Ss+     wait     0xfffffe000d5084b8 bash
  735   734   735     0  Ss      select   0xfffffe000bc817c0 screen
  734   730   734     0  S+      pause    0xfffffe000d6330a8 screen
  730   729   730     0  S+      wait     0xfffffe000d58b000 bash
  729   728   729  1001  S+      wait     0xfffffe000d7ab970 su
  728   727   728  1001  Ss+     wait     0xfffffe000de3a4b8 sh
  727   724   724  1001  S       select   0xfffffe000d98f340 sshd
  724   664   724     0  Ss      select   0xfffffe000d4180c0 sshd
  722   721   722     0  L+     *Name Cac 0xfffffe0017183780 bash
  721     1   721     0  Ss+     wait     0xfffffe000d7ab000 login
  720     1   720     0  Ss+     ttyin    0xfffffe000d0470a8 getty
  719     1   719     0  Ss+     ttyin    0xfffffe000d0474a8 getty
  718     1   718     0  Ss+     ttyin    0xfffffe000bd914a8 getty
  717     1   717     0  Ss+     ttyin    0xfffffe000bd918a8 getty
  716     1   716     0  Ss+     ttyin    0xfffffe000bd91ca8 getty
  715     1   715     0  Ss+     ttyin    0xfffffe000d0440a8 getty
  714     1   714     0  Ss+     ttyin    0xfffffe000d0444a8 getty
  713     1   713     0  Ss+     ttyin    0xfffffe000d0448a8 getty
  674     1   674     0  Ls     *Name Cac 0xfffffe0017183780 cron
  670     1   670    25  Ss      pause    0xfffffe000d5090a8 sendmail
  667     1   667     0  Ls     *Name Cac 0xfffffe0017183780 sendmail
  664     1   664     0  Ss      select   0xfffffe000d964840 sshd
  566     1   566     0  Ss      select   0xfffffe000d98f4c0 syslogd
  472     1   472     0  Ss      select   0xfffffe000bc81840 devd
  471     1   471    65  Ls     *Name Cac 0xfffffe0017183780 dhclient
  431     1   431     0  Ss      select   0xfffffe000d98fc40 dhclient
   18     0     0     0  LL     *Name Cac 0xfffffe0017183780 [softdepflush]
   17     0     0     0  RL      CPU 1                       [syncer]
    9     0     0     0  DL      vlruwt   0xfffffe000d15c000 [vnlru]
    8     0     0     0  DL      psleep   0xffffffff815305ac [bufdaemon]
    7     0     0     0  DL      pgzero   0xffffffff81538c94 [pagezero]
    6     0     0     0  DL      psleep   0xffffffff81537ed0 [vmdaemon]
    5     0     0     0  DL      psleep   0xffffffff815a4084 [pagedaemon]
    4     0     0     0  DL      ccb_scan 0xffffffff81300f30 [xpt_thrd]
    3     0     0     0  DL      waiting_ 0xffffffff81598160 [sctp_iterator]
    2     0     0     0  DL      balloon  0xffffffff8079b9f0 [balloon]
   16     0     0     0  SL      xbread   0xfffffe00feffc000 [xenstore_rcv]
   15     0     0     0  SL      waitev   0xffffffff81538e98 [xenwatch]
   14     0     0     0  DL      -        0xffffffff8131d008 [yarrow]
   13     0     0     0  DL      (threaded)                  [geom]
100023                   D       -        0xffffffff8158ca50 [g_down]
100022                   D       -        0xffffffff8158ca48 [g_up]
100021                   D       -        0xffffffff8158ca40 [g_event]
   12     0     0     0  WL      (threaded)                  [intr]
100044                   I                                   [irq7: ppc0]
100043                   I                                   [swi0: uart]
100042                   I                                   [irq12: psm0]
100041                   I                                   [irq1: atkbd0]
100038                   I                                   [irq28: xenpci0]
100037                   I                                   [irq15: ata1]
100036                   I                                   [irq14: ata0]
100032                   I                                   [swi2: cambio]
100030                   I                                   [swi5: fast taskq]
100028                   I                                   [swi6: Giant taskq]
100027                   I                                   [swi6: task queue]
100020                   I                                   [swi3: vm]
100019                   I                                   [swi1: netisr 0]
100018                   I                                   [swi4: clock]
100017                   I                                   [swi4: clock]
100016                   I                                   [swi4: clock]
100015                   I                                   [swi4: clock]
100014                   I                                   [swi4: clock]
100013                   I                                   [swi4: clock]
100012                   I                                   [swi4: clock]
100011                   I                                   [swi4: clock]
   11     0     0     0  RL      (threaded)                  [idle]
100010                   Run     CPU 7                       [idle: cpu7]
100009                   Run     CPU 6                       [idle: cpu6]
100008                   Run     CPU 5                       [idle: cpu5]
100007                   Run     CPU 4                       [idle: cpu4]
100006                   Run     CPU 3                       [idle: cpu3]
100005                   Run     CPU 2                       [idle: cpu2]
100004                   CanRun                              [idle: cpu1]
100003                   Run     CPU 0                       [idle: cpu0]
    1     0     1     0  SLs     wait     0xfffffe000bc86970 [init]
   10     0     0     0  DL      audit_wo 0xffffffff8159b860 [audit]
    0     0     0     0  DLs     (threaded)                  [kernel]
100046                   D       -        0xffffffff81343440 [deadlkres]
100045                   D       -        0xfffffe000bcbd600 [mca taskq]
100035                   D       -        0xfffffe000bcbd700 [acpi_task_2]
100034                   D       -        0xfffffe000bcbd700 [acpi_task_1]
100033                   D       -        0xfffffe000bcbd700 [acpi_task_0]
100031                   D       -        0xfffffe000bcbd900 [kqueue taskq]
100029                   D       -        0xfffffe000bcbdb00 [thread taskq]
100026                   D       -        0xfffffe000bcbde00 [ffs_trim taskq]
100024                   D       -        0xfffffe000bcbe000 [firmware taskq]
100000                   D       sched    0xffffffff8158cb48 [swapper]
db> tr 17
Tracing pid 17 tid 100055 td 0xfffffe000d14d000
cpustop_handler() at cpustop_handler+0x28/frame 0xffffff8000232cf0
ipi_nmi_handler() at ipi_nmi_handler+0x3f/frame 0xffffff8000232d10
trap() at trap+0x42/frame 0xffffff8000232f20
nmi_calltrap() at nmi_calltrap+0x8/frame 0xffffff8000232f20
--- trap 0x13, rip = 0xffffffff808b4bd6, rsp = 0xffffff8000232fe0, rbp = 0xffffff8120c569a0 ---
_mtx_trylock_flags_() at _mtx_trylock_flags_+0x106/frame 0xffffff8120c569a0
mnt_vnode_next_active() at mnt_vnode_next_active+0x141/frame 0xffffff8120c569e0
ffs_sync() at ffs_sync+0xb5/frame 0xffffff8120c56ab0
sync_fsync() at sync_fsync+0x12d/frame 0xffffff8120c56ae0
VOP_FSYNC_APV() at VOP_FSYNC_APV+0xf0/frame 0xffffff8120c56b10
sched_sync() at sched_sync+0x34c/frame 0xffffff8120c56bb0
fork_exit() at fork_exit+0x84/frame 0xffffff8120c56bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xffffff8120c56bf0
--- trap 0, rip = 0, rsp = 0xffffff8120c56cb0, rbp = 0 ---

The VM can be stuck in this state for quite some time, it generally 
varies between a couple of minutes (5-10min) to an hour or two, after 
this the VM recovers itself and resumes normal operation. I still have 
to test this on a bare metal FreeBSD install, but I would like to ask 
if someone has seen a similar behaviour, or if someone is suspicious of 
a change that could cause this.

Thanks, Roger.
Received on Mon May 13 2013 - 09:18:13 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:37 UTC