Jack Vogel wrote: > On Nov 15, 2007 4:22 PM, Andre Oppermann <andre_at_freebsd.org> wrote: >> Scott Ullrich wrote: >>> On 11/15/07, Doug Ambrisko <ambrisko_at_ambrisko.com> wrote: >>>> Hmm, I forgot about the 2970 which are AMD based. Can you check the >>>> BIOS to see if there is an option to turn it on? I think this is an >>>> Intel feature. AMD might have something close? We have one 2970 >>>> that we've played with a little but not much. I can't say for sure >>>> if it has it. >>> Right you are. As of BIOS 1.2.2 I do not see a I/OAT option. Guess >>> I will need to pick up a different server as we are interested in what >>> kind of packet forwarding rate increase that this feature might bring >>> on a heavily loaded firewall. >> Not much. Unless your firewall is in usermode. Otherwise the data >> stays in the kernel and I/OAT is of not help as no copying happens. >> Your CPU is probably spending half of its clock cycles waiting on >> cache misses from newly arrived packets. Some Intel chipset integrated >> gige ports have a cache prefetch feature (duno whether our driver >> supports it) that would help quite a bit for your case. > > What might help this is multiqueue support on the receive AND send, > and stack support for the same. Not sure what the stack changes > would look like, but I know there's interest in this sort of thing, so > naturally I'd be into it :) Dunno if multiqueue is a big win here. You have to make sure that packet order is maintained which kinda implies a single queue. Of course one could spread some load with fixed hashes to keep flows together. The reason a small 1GHz embedded MIPS CPU with integrated GigE ports can do more than 1Mpps is the cache prefetching feature. The thingies generally move the first 128bytes of every packet received into the L2 cache. This is enough for the headers and to perform a lookup on the routing table or the TCP/UDP control block table without much delay. The normal PC architecture is quite broken in that regard as everything that comes in through DMA is in cold main memory. Once the CPU wants to look at it, it has to wait an insane amount of time. That times the number of packets. In pure forwarding applications (routing) it wastes half of all CPU cycles with waiting on main memory. -- AndreReceived on Thu Nov 15 2007 - 23:43:34 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:22 UTC