Barney Cordoba wrote: > > > > --- On Thu, 3/12/09, Scott Long <scottl_at_samsco.org> wrote: > >> From: Scott Long <scottl_at_samsco.org> >> Subject: Re: Interrupt routine usage not shown by top in 8.0 >> To: barney_cordoba_at_yahoo.com >> Cc: current_at_freebsd.org >> Date: Thursday, March 12, 2009, 7:42 PM >> Barney Cordoba wrote: >>> I'm fireing 400Kpps at a udp blackhole port. >> I'm getting 6000 interrupts >>> per second on em3: >>> >>> testbox# vmstat -i; sleep 1; vmstat -i >>> interrupt total rate >>> irq1: atkbd0 1 0 >>> irq6: fdc0 1 0 >>> irq17: uhci1+ 2226 9 >>> irq18: uhci2 ehci+ 9 0 >>> cpu0: timer 470507 1993 >>> irq256: em0 665 2 >>> irq259: em3 1027684 4354 >>> cpu1: timer 470272 1992 >>> cpu3: timer 470273 1992 >>> cpu2: timer 470273 1992 >>> Total 2911911 12338 >>> >>> interrupt total rate >>> irq1: atkbd0 1 0 >>> irq6: fdc0 1 0 >>> irq17: uhci1+ 2226 9 >>> irq18: uhci2 ehci+ 9 0 >>> cpu0: timer 472513 1993 >>> irq256: em0 668 2 >>> irq259: em3 1033703 4361 >>> cpu1: timer 472278 1992 >>> cpu3: timer 472279 1992 >>> cpu2: timer 472279 1992 >>> Total 2925957 12345 >>> >>> >>> top -SH shows: >>> >>> PID STATE C TIME CPU COMMAND >>> 10 CPU3 3 7:32 100.00% idle >>> 10 CPU2 2 7:32 100.00% idle >>> 10 RUN 0 7:31 100.00% idle >>> 10 CPU1 1 7:31 100.00% idle >>> >>> This implies that CPU usage is substantially >> under-reported in general >>> by the system. Note that I've modified >> em_irq_fast() to call em_handle_rxtx() directly rather than >> scheduling a task to illustrate >>> the problem >>> >> With unmodified code, what do you see? Are you sending >> valid UDP frames with valid checksums and a valid port, or >> is everything that you're blasting at the interface >> getting dropped right away? Calling em_handle_rxtx() >> directly will cause a very quick panic once you start >> handling real traffic and you encounter a lock. >> >> Scott > > I think you're mistaken. I'm also accessing the system via an em port > (and running top) and em_handle_rxtx() is self contained lock-wise. > The taskqueue doesn't obtain a lock before calling the routine. > I understand perfectly how the code works, as I wrote it. While there are no locks in the RX path of the driver, there are certainly locks higher up in the network stack RX path. You're not going to hit them in your test, but in the real world you will. > As I mentioned, they're being dumped into a udp blackhole, which implies > that I have udp.blackhole set and the port is unused. I can see the > packets hit the udp socket so its working as expected: > > 853967872 dropped due to no socket > > With unmodified code, the tasq shows 25% usage or so. > > I'm not sure what the point of your criticism for what clearly is a test. > Are you implying that the system can receive 400K pps with 6000 ints/sec > and record 0% usage because of a coding imperfection? Or are you implying > that the 25% usage is all due to launching tasks unnecessarily and process > switching? Prior to FreeBSD 5, interrupt processing time was counted in the %intr stat. With FreeBSD 5 and beyond, most interrupts moved to full processing contexts called ithreads, and the processing time spent in the ithread was counted in the %intr stat. The time spent in low-level interrupts was merely counted against the process that got interrupted. This wasn't a big deal because low-level interrupts were only used to launch ithreads and to process low-latency interrupts for a few drivers. Moving to the taskq model breaks this accounting model. What's happening in your test is that the system is almost completely idle, so the only thing that is being interrupted by the low-level if_em handler is the cpu idle thread. Since you're also bogusly bypassing the deferral to the taskq, all stack processing is also happening in this low-level context, and it's being counted against the CPU idle thread. However, the process accounting code knows not to charge idle thread time against the normal stats, because doing so would result in the system always showing 100% busy. So your test is exploiting this; you're stealing all of your cycles from the idle threads, and they aren't being accounted for because it's hard to know when the idle thread is having its cycles stolen. So no, 25% of a CPU isn't going to "launching tasks unnecessarily and process switching." It's going to processing 400k packets/sec off of the RX ring and up the stack to the UDP layer. I think that if you studied how the code worked, and devised more useful benchmarks, you'd see that the taskq deferral method is usually a significant gain in performance over polling or simple ithreads. There is certainly room for more improvement, and my taskq scheme isn't the only way to get good performance, but it does work fairly well. ScottReceived on Thu Mar 12 2009 - 23:35:51 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:43 UTC