Early Splash Screen
After working exclusively on my PC for a while adding functionality I decided to deploy to my iPhone to make sure everything still works well. When I did that I was happy to see that it did run correctly, but not so happy to see pretty poor performance.
In the game's splash screen I was only getting about 10 FPS, even though I'm shooting for 30. The splash screen is ironically more computationally intensive than a real game screen because of the grid resolution. For a real game screen the board is around 10x15 squares, depending on the level. For the splash screen, currently, it is 20x30 squares. That a difference from 176 Poles on the game screen to 651 on the splash screen. Clearly I needed to either optimize a lot or reduce the complexity of the splash screen. On the other hand, if I could get the splash screen performing well, then that would be a good baseline such that if it worked well on a particular device then the real game screens should be just fine.
Because of my platform independent abstraction efforts I described in an earlier post, I have access to the sophisticated profiling tools in Visual Studio (VS) and can quickly find the areas of code that are taking the most time. I realize that Xcode on Mac includes the Instruments tool, but so far I haven't been able to get it to work well with my MonoTouch generated code so that no symbol names are displayed. Only function addresses are shown, so it is nearly useless for this type of investigation. I've reached out to the MonoTouch community and hopefully there is a way to get symbols displayed, but for now I'll rely on VS.
Too Many Objects
One area I suspected might become a problem even when I was first creating it was myThe problem is that the game does these triangle manipulations for almost every single graphic element for every frame. In the case of the splash screen thats
Instead of using
Unfortunately, the result of this rewrite was that there was not measurable performance gain in either CPU usage or memory allocation. How can that be? Well, it turns out that
Array Bounds Checking
OpenGL requires an array ofTo see if it was significant for me, I just wrapped the area of code that populates the array with an
Too Many Function Calls
Initially when I populated theUnfortunately I didn't profile this change carefully so I can't say how much of an affect it had, but I imagine it would have been measurable reducing more than 60,000 property calls to closer to 10,000 method calls.
Too Many Calculations
My triangle object remembers its original state (position, rotation, scale, etc) so that I can manipulate it based on that state. A side affect of this is that when I want to get the actual vertices I need to calculate them based on its original state and the current transformations. Initially I was doing this every time I needed the vertices, even if its state or transformations hadn't changed.To fix this I added a boolean flag,
This optimization reduced these calculations from about 29% of CPU time to 11%. Now, finally, a nice gain.
Too Many Draws
After all of the above investigations and optimizations, it turned out the biggest improvement I made was from a stupid mistake.I refactored the class that acts as the main view manager in order to better support different views; a game view and a splash screen view in this case. In doing that refactor I reduced the two-phase update-then-draw functionality into one-phase update-and-draw. In my case there's no real benefit to either case, and the single phase seemed a bit simpler. When I did this, however, the main render loop, which is one of the platform-dependent classes, still called both phases. As part of the refactor the update method became renamed to the draw one. The consequence of this was that on the iPhone version of the render loop, the draw method was called twice, once when it used to be for update and then again for the draw. Since I did the refactor on my PC I tested it there and must have fixed this problem for the PC version of the render loop. When I went to test the iPhone version, however, it was still doing this double work.
Anyway, when I found the problem and removed the redundancy there was a significant performance improvement. That, along with my other optimizations, means the game is back to running my targeted 30 FPS, even in the more complex splash screen. When it's doing that it's eating about 70% CPU, and I'd really like to bring that down for battery life and to better support older devices. I do have one more significant idea based on data from the VS profiler, but I haven't gotten to it. If it works well I'm hoping for about a 10% gain in efficiency. If and when I do another pass as optimizations I'll write about them here.
Next time: Fighting with Blogger.
Awesome read.
ReplyDeleteCan't wait for the next post.
Cheers.