"panic: mutex pf task mtx owned at /usr/src/sys/contrib/pf/net/if_pfsync.c:3163"

From: Matthew Economou <mxeconomou_at_gmail.com> Date: Fri, 26 Aug 2011 17:51:48 -0400 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:17 UTC

I recently upgraded a firewall I'm using for performance testing from
a March-ish 9-CURRENT to 9.0-BETA1 (csup run August 21 around 12:00 AM
EDT).  It's basically a GENERIC kernel with debugging disabled and
things like IPsec and ALTQ enabled.  Since the upgrade, after
approximately an hour after it boots, the firewall stops passing any
traffic (IPv4 and IPv6).  OpenVPN, for example, logs the following
errors:

  write UDPv4: Operation not permitted (code=1)

Quagga, for another example, logs something similar:

  ripd[1696]: can't send packet : Operation not permitted0
  ospfd[1702]: *** sendmsg in ospf_write failed to 172.30.0.3, id 0,
off 0, len 76, interface tap0 mtu 1500: Operation not permitted

If I try to ping something from the console, I get the same error message:

  # ping 4.2.2.2
  ping: sendto: Operation not permitted
It appears that PF isn't removing any entries from the state table.
Note that the state table size is at its default of 10000 (which
correlates to the amount of memory installed on the firewall - 256
MB).

State Table                          Total             Rate
  current entries                    10013
  searches                          554801           13.4/s
  inserts                            10013            0.2/s
  removals                               0            0.0/s

I've tried both my current (unmodified and working prior to the
upgrade) and experimental PF configurations, neither of which have any
effect on the problem.  Reloading the PF configuration (/etc/rc.d/pf
reload) or restarting PF altogether (/etc/rc.d/pf restart) also have
no effect.  Only if I shut down PF completely (/etc/rc.d/pf stop) do I
regain network connectivity - I can do things like ping hosts (IPv4
and IPv6), browse the web, and pass traffic that's just routed through
the firewall (i.e., not requiring NAT).  Clearing the state table
(pfsync -F state) has no effect.

The kernel I'm was running had debugging disabled for performance
testing purposes, so I booted a proper debug kernel.  It panicked in
pfsync_send_plus as soon as init enabled PF (backtrace included
below).

Starting pflog.
pflog0: promiscuous mode enabled
Aug 25 20:54:21 pflogd[1611]: [priv]: msg PRIV_OPEN_LOG received
Enabling pfpanic: mutex pf task mtx owned at
/usr/src/sys/contrib/pf/net/if_pfsync.c:3163
cpuid = 0
KDB: enter: panic
[ thread pid 1619 tid 100053 ]
Stopped at      kdb_enter+0x3a: movl    $0,kdb_why
db> bt
Tracing pid 1619 tid 100053 td 0xc23da2e0
kdb_enter(c09777c9,c09777c9,c0975d7b,c6fd79e0,0,...) at kdb_enter+0x3a
panic(c0975d7b,c0946080,c0944e87,c5b,c6fd7a0c,...) at panic+0x134
_mtx_assert(c0a1b388,0,c0944e87,c5b,c6fd7a24,...) at _mtx_assert+0x127
pfsync_send_plus(c6fd7a24,18,10,ad6,1000000,...) at pfsync_send_plus+0xf2
pfsync_clear_states(a218d664,c236fb78,c0945f1c,635,c09ae167,...) at
pfsync_clear_states+0x8d
pfioctl(c22a0800,c0cc4412,c236fb00,3,c23da2e0,...) at pfioctl+0x1b90
devfs_ioctl_f(c23ce578,c0cc4412,c236fb00,c216ce80,c23da2e0,...) at
devfs_ioctl_f+0x10b
kern_ioctl(c23da2e0,3,c0cc4412,c236fb00,1fd7cec,...) at kern_ioctl+0x21d
ioctl(c23da2e0,c6fd7cec,c6fd7d28,c097d93a,0,...) at ioctl+0x134
syscallenter(c23da2e0,c6fd7ce4,c6fd7ce4,0,0,...) at syscallenter+0x263
syscall(c6fd7d28) at syscall+0x34
Xint0x80_syscall() at Xint0x80_syscall+0x21
--- syscall (54, FreeBSD ELF32, ioctl), eip = 0x281e6263, esp =
0xbfbfe8ac, ebp = 0xbfbfe998 ---
db>

I'm at a loss as to how to proceed.  Is this a known problem with PF?
Can anyone suggest a work-around?

Best wishes,
Matthew