Re: LORs with ipfw

From: Robert Watson <rwatson_at_freebsd.org> Date: Wed, 7 Jul 2004 23:47:30 -0400 (EDT) · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:00 UTC

On Wed, 7 Jul 2004, Wiktor Niesiobedzki wrote:

> lock order reversal
>  1st 0xc07287c8 IPFW static rules (IPFW static rules) _at_ /usr/src/sys/netinet/ip_fw2.c:1828
>  2nd 0xc065cfcc tcp (tcp) _at_ /usr/src/sys/netinet/ip_fw2.c:1574
> Stack backtrace:
> backtrace(c05ec5a7,c065cfcc,c05ec12e,c05ec12e,c0726a3c) at backtrace+0x17
> witness_checkorder(c065cfcc,9,c0726a3c,626,806) at witness_checkorder+0x678
> _mtx_lock_flags(c065cfcc,0,c0726a3c,626,0) at _mtx_lock_flags+0x80
> check_uidgid(c15610a4,6,0,e08d1f53,1bd) at check_uidgid+0xd3
> ipfw_chk(cb9b6bf4,cb9b6c48,c1189014,1,0) at ipfw_chk+0x9e2
> ip_input(c1395c00,0,c071c576,1d0,0) at ip_input+0x375
> transmit_event(c1510c00,0,c071c576,300,2) at transmit_event+0x14b
> dummynet(0,0,c05ea27a,f6,1) at dummynet+0x1a9
> softclock(0,0,c05e6b67,263,c0631d40) at softclock+0x1aa
> ithread_loop(c10dd500,cb9b6d48,c05e695e,327,c10dd500) at ithread_loop+0x172
> fork_exit(c04a5b80,c10dd500,cb9b6d48) at fork_exit+0xbc
> fork_trampoline() at fork_trampoline+0x8
> 
> This is from yesterdays CURRENT. I have compiled kernel with
> CPUTYPE=athlon-xp and CFLAGS=-O2. Currently I'm not able to reproduce
> this messages with CPUTYPE=i686 and empty CFLAGS. 
> 
> Does anyone has an clue, where the problem may lie here (or is it just
> harmless?) 

This is a warning about a potentially harmful, but somewhat harder to fix
issue.  Basically, we currently have what amounts to a subsystem or giant
lock over the ipfw rule set and its evaluation.  Normally, the ipfw lock
will fall "after" most other locks, including protocol control block (pcb)
locks, as it will be called from other protocol code during processing.
However, when using a uid/gid rule, the protocol control block for the
packet is looked up by the ipfw code, which acquires pcb locks after the
ipfw lock.  There are a few things to think about here:

(1) This lock order reversal is really a result of a layering violation --
    the ipfw code is acting on packets at the IP layer, and looking up the
    connection from the IP layer results in cross-layer transitions that
    don't fit the general model.

(2) The lock order reversal occurs in a situation where a race condition
    also occurs -- the pcb may actually be looked up twice for inbound
    packets, once in ipfw, and then again for delivery.  While it's
    somewhat unlikely, the pcb could change in that window.  The window is
    stretched out through the use of functionality like dummynet.

(3) One way to think about fixing this is to avoid the need to hold the
    ipfw lock across the entire execution of ipfw.  I've been thinking
    about reference-counting the rule set, such that each instance of a
    thread entering the ipfw code sees the rule set as read-only and can
    access it lock-free once it has acquired a reference, releasing the
    reference on exit.  For long rule sets, this would help reduce
    contention.  You can imagine various variations on the model, such as
    per-cpu rule set instances, etc.  There are some interesting challengs
    in dynamic state management, however.

Robert N M Watson             FreeBSD Core Team, TrustedBSD Projects
robert_at_fledge.watson.org      Principal Research Scientist, McAfee Research