Re: CFT: if_bridge performance improvements

From: Kristof Provost <kp_at_FreeBSD.org>
Date: Fri, 24 Apr 2020 15:42:08 +0200
On 22 Apr 2020, at 18:15, Xin Li wrote:
> On 4/22/20 01:45, Kristof Provost wrote:
>> On 22 Apr 2020, at 10:20, Xin Li wrote:
>>> Hi,
>>>
>>> On 4/14/20 02:51, Kristof Provost wrote:
>>>> Hi,
>>>>
>>>> Thanks to support from The FreeBSD Foundation I’ve been able to 
>>>> work on
>>>> improving the throughput of if_bridge.
>>>> It changes the (data path) locking to use the NET_EPOCH 
>>>> infrastructure.
>>>> Benchmarking shows substantial improvements (x5 in test setups).
>>>>
>>>> This work is ready for wider testing now.
>>>>
>>>> It’s under review here: https://reviews.freebsd.org/D24250
>>>>
>>>> Patch for CURRENT: https://reviews.freebsd.org/D24250?download=true
>>>> Patches for stable/12:
>>>> https://people.freebsd.org/~kp/if_bridge/stable_12/
>>>>
>>>> I’m not currently aware of any panics or issues resulting from 
>>>> these
>>>> patches.
>>>
>>> I have observed the following panic with latest stable/12 after 
>>> applying
>>> the stable_12 patchset, it appears like a race condition related 
>>> NULL
>>> pointer deference, but I haven't took a deeper look yet.
>>>
>>> The box have 7 igb(4) NICs, with several bridge and VLAN configured
>>> acting as a router.  Please let me know if you need additional
>>> information; I can try -CURRENT as well, but it would take some time 
>>> as
>>> the box is relatively slow (it's a ZFS based system so I can create 
>>> a
>>> separate boot environment for -CURRENT if needed, but that would 
>>> take
>>> some time as I might have to upgrade the packages, should there be 
>>> any
>>> ABI breakages).
>>>
>> Thanks for the report. I don’t immediately see how this could 
>> happen.
>>
>> Are you running an L2 firewall on that bridge by any chance? An 
>> earlier
>> version of the patch had issues with a stray unlock in that code 
>> path.
>
> I don't think I have a L2 firewall (I assume means filtering based on
> MAC address like what can be done with e.g. ipfw?  The bridges were
> created on vlan interfaces though, do they count as L2 firewall?), the
> system is using pf with a few NAT rules:
>

That backtrace looks identical to the one Peter reported, up to and 
including the offset in the bridge_input() function.
Given that there’s no likely way to end up with a NULL mutex either I 
have to assume that it’s a case of trying to unlock a locked mutex, 
and the most likely reason is that you ran into the same problem Peter 
ran into.

The current version of the patch should resolve it.

Best regards,
Kristof
Received on Fri Apr 24 2020 - 11:42:11 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:23 UTC