Apple’s Xcode comes with a neat profiling tool called CPU Sampler. It helps you identifying time consuming code blocks in your software and is really handy if you need to optimize for optimal performance.
Just recently I tried rewriting OpenGL’s routines that handle matrix calculations as part of a lecture at the university. Pretty soon I came to the conclusion, that I had to optimize my code if I wanted to compete with the implementation provided by OpenGL.
This post presents at quick look at CPU Sampler which helped me making well-founded decisions resulting in a faster implementation.