Re: CFT: if_bridge performance improvements

From: Xin Li <delphij_at_delphij.net>
Date: Fri, 24 Apr 2020 14:12:02 -0700
On 4/24/20 06:42, Kristof Provost wrote:
> On 22 Apr 2020, at 18:15, Xin Li wrote:
>> On 4/22/20 01:45, Kristof Provost wrote:
>>> On 22 Apr 2020, at 10:20, Xin Li wrote:
>>>> Hi,
>>>>
>>>> On 4/14/20 02:51, Kristof Provost wrote:
>>>>> Hi,
>>>>>
>>>>> Thanks to support from The FreeBSD Foundation I’ve been able to
>>>>> work on
>>>>> improving the throughput of if_bridge.
>>>>> It changes the (data path) locking to use the NET_EPOCH
>>>>> infrastructure.
>>>>> Benchmarking shows substantial improvements (x5 in test setups).
>>>>>
>>>>> This work is ready for wider testing now.
>>>>>
>>>>> It’s under review here: https://reviews.freebsd.org/D24250
>>>>>
>>>>> Patch for CURRENT: https://reviews.freebsd.org/D24250?download=true
>>>>> Patches for stable/12:
>>>>> https://people.freebsd.org/~kp/if_bridge/stable_12/
>>>>>
>>>>> I’m not currently aware of any panics or issues resulting from these
>>>>> patches.
>>>>
>>>> I have observed the following panic with latest stable/12 after
>>>> applying
>>>> the stable_12 patchset, it appears like a race condition related NULL
>>>> pointer deference, but I haven't took a deeper look yet.
>>>>
>>>> The box have 7 igb(4) NICs, with several bridge and VLAN configured
>>>> acting as a router.  Please let me know if you need additional
>>>> information; I can try -CURRENT as well, but it would take some time as
>>>> the box is relatively slow (it's a ZFS based system so I can create a
>>>> separate boot environment for -CURRENT if needed, but that would take
>>>> some time as I might have to upgrade the packages, should there be any
>>>> ABI breakages).
>>>>
>>> Thanks for the report. I don’t immediately see how this could happen.
>>>
>>> Are you running an L2 firewall on that bridge by any chance? An earlier
>>> version of the patch had issues with a stray unlock in that code path.
>>
>> I don't think I have a L2 firewall (I assume means filtering based on
>> MAC address like what can be done with e.g. ipfw?  The bridges were
>> created on vlan interfaces though, do they count as L2 firewall?), the
>> system is using pf with a few NAT rules:
>>
> 
> That backtrace looks identical to the one Peter reported, up to and
> including the offset in the bridge_input() function.
> Given that there’s no likely way to end up with a NULL mutex either I
> have to assume that it’s a case of trying to unlock a locked mutex, and
> the most likely reason is that you ran into the same problem Peter ran
> into.
> 
> The current version of the patch should resolve it.

Thanks, I'd like to report that after applying the patch from Peter the
system seems to survive without problem.

Cheers,



Received on Fri Apr 24 2020 - 19:12:09 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:23 UTC