Ian, big thanks for your testing and feedback made. When I made this option, I have tested it on a small number for VLANs and expected it to be a nop. However, it was a regression. That's why I decided that on moderate number of vlans it would be a regression. Probably on modern hardware with bigger CPU caches it isn't. I do most of my performance testing on quad PIII. On Fri, Aug 25, 2006 at 09:37:46AM +0200, Ian FREISLICH wrote: I> While doing some experimentation and work on ipfw to see where I I> could improve performance for our virtualised firewall I came across I> the following comment in sys/net/if_vlan.c: I> I> * The VLAN_ARRAY substitutes the dynamic hash with a static array I> * with 4096 entries. In theory this can give a boots(sic) in processing, I> * however on practice it does not. Probably this is because array I> * is too big to fit into CPU cache. I> I> Being curious and having determined the main throughput bottleneck I> to be the vlan driver, I thought that I'd test the assertion. I I> have have 506 vlans on this machine. I> I> With VLAN_ARRAY unset, ipfw disabled, fastforwarding enabled, I> vlanhwtag enabled on the interface, the fastest forwarding rate I I> could get was 278kpps (This was a steady decrease from 440kpps with I> 24 vlans linearly proportional to the number of vlans). I> I> With exactly the same configuration, but the vlan driver compiled I> with VLAN_ARRAY defined, the forwarding rate of the system is back I> at 440kpps. I> The testbed looks like this: I> I> |pkt gen | | router | | pkt rec | I> | host |vlan2 vlan2 | |vlan1002 vlan1002 | host | I> |netperf |----------->| |------------------->| netserver| I> | |em0 em0 | |em1 em0 | | I> I> The router has vlan2 to vlan264 and vlan1002 through vlan1264 in I> 22 blocks of 23 vlan groups (a consequence of 24 port switches to I> to tag/untag for customers). The pkt gen and recieve host both I> have 253 vlans. I> I> Can anyone suggest a good reason not to turn this option on by I> default. It looks to me like it dramatically improves performance. As said before by Andrew. It consumes memory. And looks like a regression on a system with small number of vlans. However, after your email I see that we need to document this option in vlan(4) and encourage people to try it, when they are building a system with a huge number of vlans. And here are some more performance thoughts on vlan(4) driver. When we are processing an incoming VLAN tagged frame, we need either hash or the array to determine which VLAN does this frame belong to. When we are sending a VLAN frame outwards, we don't need this lookup. I've made some tests and it looks like that the performance decrease that is observed between bare Ethernet interface and vlan(4) interface, is mostly caused by the transmit part. The packet is put twice on interface queues. I hope, this will be optimized after Robert Watson finishes his if_start_mbuf work. -- Totus tuus, Glebius. GLEBIUS-RIPN GLEB-RIPEReceived on Fri Aug 25 2006 - 08:11:00 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:59 UTC