IntroductionI'll start with a brief explanation of why it has been so long since my last entry. Partly this is because I've been busy with other things and the recent holidays have also taken up some of my time, but the main reason is that this round of optimizations has been quite a bit more complex than I anticipated. Interestingly, it hasn't been the actual code changes that have been challenging, it has been my efforts to carefully analyze my code's slow areas and fully understand the optimizations I've made.
In some ways this is related to my previous post on optimizations. In that post I left with a conclusion that you need to be careful when optimizing to make sure you understand if and how the changes are really improving performance. To that end I tried to be more diligent in this round of optimizations.
My initial goal in this round was to optimize my game's calculations when advancing the game state frame by frame. I'm currently targeting 30 FPS, but would like that to be as efficient as possible in order to preserve battery life. Also, even though the iPhone can do graphics up to 60 FPS, I examined what my game does and found that there is very little perceivable visual improvement moving from 30 FPS to 60 FPS, which is why I decided to target the former.
To be able to properly analyze the performance of my code's calculations, as opposed to time taken by the actual graphics chip's rendering, I created a benchmark mode for my game that does all calculations as normal, but doesn't do any of the OpenGL API calls. I also do this outside of the top-level game loop, since the OpenTK framework imposes its own FPS limitations in this loop.
Sample SizeI should mention that all of this optimization work was done on my PC in Visual Studio. Since my goal is to optimize my algorithms, I'm working on the assumption that the improvements will also be reflected in code when running on an actual iPhone.
When I started testing this benchmark mode I tried running for a few thousand frames. I got results around 1200 FPS. I initially assumed this number of frames was enough to produce useful results and started the process of running under various conditions both for generating performance statistics to measure improvements and for running within VS' performance analysis tools to see what areas of the code were slowest and the best target for optimization efforts.
The problem I found was that the performance differed wildly from run to run. I ended up wasting quite a lot of time doing analysis with these too-short runs that produced inconsistent results. I finally found that the only real way to get more consistent results was to simply increase the number of frames to render and therefore the test time. Eventually I settled on running for 100,000 frames, or about one minute. At this number of frames I found that I could get multiple runs to finish within a few milliseconds of one another, which, over 1 minute, is quite consistent.
First Optimization - MathUsing this sample size, my baseline analysis produced a run in 78.05 seconds at 1281.23 FPS. I then ran this through the VS performance analysis tools. This revealed a lot of information, but I decided to start with one particular method I though I could quickly optimize.
I have an object called
When the final transformed shape is needed I have a method called
I rewrote this to remove temporary object creation (actually
The result of this was the 100,000 frames finished in 68.9 seconds at 1451.38 FPS. That's 88% of the original time, and 113% of the original FPS.
Second Optimization - LINQAfter the above optimization I did another performance analysis in VS and received this summary report:
I was not surprised to see the LINQ entry there since when I was originally writing that area of the code I knew it wasn't particularly efficient. At the time, however, I wrote it as simply as possible and decided I would optimize it if necessary later. Well, now it's later and time to optimize.
For this area, I can't do any simple localized calculation optimizations, however. That LINQ code is all in the .NET framework and out of my control. In the VS performance analyzer, if I dig into that method, no source code is shown at all, since it is part of the core libraries. In fact, trying to determine where this code is called from is slightly complicated because VS shows this only indirectly. It shows the caller of this LINQ as:
But, in this there's no
Hah, see, I told you I wasn't surprised to see this as taking a lot of time. I even wrote a comment to myself about it. One thing the VS performance analysis doesn't show very well is why this is slow. Although this code clearly has a lot of stringed-together
The reason my original code was like this is because I'm developing in .NET 3.5, which does not support covariance and contravariance and therefore doesn't allow casting collections of subclasses into collections of common supperclasses. This is supported in .NET 4.0, however.
My solution to this was to create simple collection classes that hold my
At a different area in the code I do a
I also spent a bit of time on a third optimization. The end result of that was no performance improvement at all, in fact, it was 3% slower that the previous version. I won't describe this optimization in detail now, and I'm actually keeping the slower version for now. The reason is that I think it lays a good foundation for possibly making a more significant optimization in the future, and in some ways is cleaner that the earlier version. If I do such an optimization later I'll then describe it more.
The end result of all of the above was that my original performance of 78.05 seconds at 1281.23 FPS was improved to 57.08 seconds at 1752.04 FPS. This is 73% of the original time and 137% of the original FPS.
All in all I'm pretty happy with that result. I'm now able to continue feature development and will hopefully be able to blog more regularly again.
Next: I'm Back