2010-10-24

Optimizing .NET on the iPhone


Early Splash Screen

After working exclusively on my PC for a while adding functionality I decided to deploy to my iPhone to make sure everything still works well. When I did that I was happy to see that it did run correctly, but not so happy to see pretty poor performance.

In the game's splash screen I was only getting about 10 FPS, even though I'm shooting for 30. The splash screen is ironically more computationally intensive than a real game screen because of the grid resolution. For a real game screen the board is around 10x15 squares, depending on the level. For the splash screen, currently, it is 20x30 squares. That a difference from 176 Poles on the game screen to 651 on the splash screen. Clearly I needed to either optimize a lot or reduce the complexity of the splash screen. On the other hand, if I could get the splash screen performing well, then that would be a good baseline such that if it worked well on a particular device then the real game screens should be just fine.

Because of my platform independent abstraction efforts I described in an earlier post, I have access to the sophisticated profiling tools in Visual Studio (VS) and can quickly find the areas of code that are taking the most time. I realize that Xcode on Mac includes the Instruments tool, but so far I haven't been able to get it to work well with my MonoTouch generated code so that no symbol names are displayed. Only function addresses are shown, so it is nearly useless for this type of investigation. I've reached out to the MonoTouch community and hopefully there is a way to get symbols displayed, but for now I'll rely on VS.

Too Many Objects

One area I suspected might become a problem even when I was first creating it was my Triangle object. It contained 3 Vector2 objects, one for each vertex of the triangle. It also has a number of manipulation methods (rotate, scale, etc) which calculate new vertices and replace the old ones with the new ones.

The problem is that the game does these triangle manipulations for almost every single graphic element for every frame. In the case of the splash screen thats 651 x 3 x 30 = 58590 new Vertex2 objects created every second, and that's just for the poles on the screen, there are many other elements as well. Although the .NET memory management system is quite efficient, this is still a lot of overhead that is completely unnecessary if I change how my Triangle works. Which is exactly what I did...

Instead of using Vertex2 objects for my triangle's points, I changed them to 6 float values. The 6 values are the X and Y values for each of the 3 vertices. I used float because that is what I later pass to OpenGL and want to avoid the overhead of casting.

Unfortunately, the result of this rewrite was that there was not measurable performance gain in either CPU usage or memory allocation. How can that be? Well, it turns out that Vector2 is a struct, not an object. This means that it is not allocated and garbage collected in the same way and creating new instances of it is quite light weight. Although I should have checked that Vertex2 is a class before embarking on this optimization it didn't take that much time and I learned a bit.

Array Bounds Checking

OpenGL requires an array of float values in GL.VertexPointer and GL.TexCoordPointer. To populate this array I go through each of my graphic elements and add its vertices to the end of the array. In .NET there is automatic bounds checking performed on arrays to ensure you do not access memory outside of the array. This bounds checking is great for safety, but there is a performance penalty associated with it.

To see if it was significant for me, I just wrapped the area of code that populates the array with an unsafe { ... } block. After doing this that function went from about 26% total CPU usage down to about 23%. It's only a 3% improvement, but it's a very simple one. Since I carefully calculate my array index in this routine I'm confident the lack of bounds checking will be ok in this case.

Too Many Function Calls

Initially when I populated the float arrays for OpenGL I was calling a property for each vertex in my triangle one at a time. I changed this so that I pass in an array and starting index into the triangle and it populates it. This reduced the number of property accesses by 6 times.

Unfortunately I didn't profile this change carefully so I can't say how much of an affect it had, but I imagine it would have been measurable reducing more than 60,000 property calls to closer to 10,000 method calls.

Too Many Calculations

My triangle object remembers its original state (position, rotation, scale, etc) so that I can manipulate it based on that state. A side affect of this is that when I want to get the actual vertices I need to calculate them based on its original state and the current transformations. Initially I was doing this every time I needed the vertices, even if its state or transformations hadn't changed.

To fix this I added a boolean flag, isDirty that I use to keep track of it state. If its state changes then I set the flag to true and then next time I need the vertices I calculate them only if that flag is set to true and then set it to false.

This optimization reduced these calculations from about 29% of CPU time to 11%. Now, finally, a nice gain.

Too Many Draws

After all of the above investigations and optimizations, it turned out the biggest improvement I made was from a stupid mistake.

I refactored the class that acts as the main view manager in order to better support different views; a game view and a splash screen view in this case. In doing that refactor I reduced the two-phase update-then-draw functionality into one-phase update-and-draw. In my case there's no real benefit to either case, and the single phase seemed a bit simpler. When I did this, however, the main render loop, which is one of the platform-dependent classes, still called both phases. As part of the refactor the update method became renamed to the draw one. The consequence of this was that on the iPhone version of the render loop, the draw method was called twice, once when it used to be for update and then again for the draw. Since I did the refactor on my PC I tested it there and must have fixed this problem for the PC version of the render loop. When I went to test the iPhone version, however, it was still doing this double work.

Anyway, when I found the problem and removed the redundancy there was a significant performance improvement. That, along with my other optimizations, means the game is back to running my targeted 30 FPS, even in the more complex splash screen. When it's doing that it's eating about 70% CPU, and I'd really like to bring that down for battery life and to better support older devices. I do have one more significant idea based on data from the VS profiler, but I haven't gotten to it. If it works well I'm hoping for about a 10% gain in efficiency. If and when I do another pass as optimizations I'll write about them here.

Next time: Fighting with Blogger.

1 comment: