I find that the best way to profile the kernel is with pmc. You don't need to compile anything with a special option(other than including the hwpmc hooks in the kernel with the HWPMC_HOOKS option) so you can use it at any time on the same code you'll be shipping. pmc does statistical profiling; it uses whatever performance monitoring counters are provided by the hardware. It has a pretty low overhead, especially compared with other profiling techniques. It's really easy to use, too: 1) If hwpmc is not compiled into your kernel, kldload hwpmc 2) Run pmcstat to begin taking samples(make sure that whatever you are profiling is busy doing work first!): pmcstat -S unhalted-cycles -O /tmp/samples.out The -S option specifies what event you want to use to trigger sampling. The unhalted-cycles is the best event to use if your hardware supports it; pmc will take a sample every 64K non-idle CPU cycles, which is basically equivalent to sampling based on time. If the unhalted-cycles event is not supported by your hardware then the instructions event will probably be the next best choice(although it's nowhere near as good, as it will not be able to tell you, for example, if a particular function is very expensive because it takes a lot of cache misses compared to the rest of your program). One caveat with the unhalted-cycles event is that time spent spinning on a spinlock or adaptively spinning on a MTX_DEF mutex will not be counted by this event, because most of the spinning time is spent executing an hlt instruction that idles the CPU for a short period of time. Modern Intel and AMD CPUs offer a dizzying array of events. They're mostly only useful if you suspect that a particular kind of event is hurting your performance and you would like to know what is causing those events. For example, if you suspect that data cache misses are causing you problems you can take samples on cache misses. Unfortunately on some of the newer CPUs(namely the Core2 family, because that's what I'm doing most of my profiling on nowadays) I find it difficult to figure out just what event to use to profile based on cache misses. man pmc will give you an overview of pmc, and there are manpages for every CPU family supported(eg man pmc.core2) 3) After you've run pmcstat for "long enough"(a proper definition of long enough requires a statistician, which I most certainly am not, but I find that for a busy system 10 seconds is enough), Control-C it to stop it*. You can use pmcstat to post-process the samples into human-readable text: pmcstat -R /tmp/samples.out -G /tmp/graph.txt The graph.txt file will show leaf functions on the left and their callers beneath them, indented to reflect the callchain. It's not too easy to describe and I don't have sample output available right now. Another interesting tool for post-processing the samples is pmcannotate. I've never actually used the tool before but it will annotate the program's source to show which lines are the most expensive. This of course needs unstripped modules to work. I think that it will also work if the GNU "debug link" is in the stripped module pointing to the location of the file with symbols. * Here's a tip I picked up from Joseph Koshy's blog: to collect samples for a fixed period of time(say 1 minute), have pmcstat run the sleep command: pmcstat -S unhalted-cycles -O /tmp/samples.out sleep 60Received on Mon Dec 14 2009 - 18:26:35 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:59 UTC