2013-10-05

OpenGL and OpenTK Fun

I've been working on updating my OpenGL code to use OpenGL ES 2.0 (instead of 1.1). I've run into a number of problems doing this, many of which are exasperated because I don't if my problems are related to my OpenGL code or the underlying libraries I'm using (OpenTK and the PowerVR OpenGL ES implementation for Windows).

To try to diagnose the problems I tried to start with a simple OpenGL ES 2.0 example. Unfortunately, all of the examples I found were either for native OpenGL ES (not via .NET) or specifically from Xamarin for iOS or Android, where I wanted a simple OpenTK example. The OpenTK distribution actually includes one very simple example (just showing an empty window with a colored fill), but as I mentioned in my previous post, the latest official OpenTK release is from 2010 and has some issues.

I searched around and found what appears to be the most active development branch of OpenTK on GitHub at https://github.com/andykorth/opentk. I downloaded that branch and tried to run the same example, but continued to run into problems.

At first I got a System.PlatformNotSupportedException.
System.PlatformNotSupportedException at OpenTK.Platform.Factory.UnsupportedPlatform.CreateGLContext(GraphicsMode mode, IWindowInfo window, IGraphicsContext shareContext, Boolean directRendering, Int32 major, Int32 minor, GraphicsContextFlags flags) in opentk\Source\OpenTK\Platform\Factory.cs:line 171
   at OpenTK.Graphics.GraphicsContext..ctor(GraphicsMode mode, IWindowInfo window, Int32 major, Int32 minor, GraphicsContextFlags flags) in opentk\Source\OpenTK\Graphics\GraphicsContext.cs:line 134
   at OpenTK.GameWindow..ctor(Int32 width, Int32 height, GraphicsMode mode, String title, GameWindowFlags options, DisplayDevice device, Int32 major, Int32 minor, GraphicsContextFlags flags, IGraphicsContext sharedContext) in opentk\Source\OpenTK\GameWindow.cs:line 222
   at OpenTK.GameWindow..ctor(Int32 width, Int32 height, GraphicsMode mode, String title, GameWindowFlags options, DisplayDevice device, Int32 major, Int32 minor, GraphicsContextFlags flags) in opentk\Source\OpenTK\GameWindow.cs:line 180
   at Examples.Tutorial.SimpleES20Window..ctor(GraphicsContextFlags flags) in opentk\Source\Examples\OpenGLES\2.0\SimpleWindow20.cs:line 26
   at Examples.Tutorial.SimpleES20Window.Main() in opentk\Source\Examples\OpenGLES\2.0\SimpleWindow20.cs:line 112
   at System.AppDomain._nExecuteAssembly(RuntimeAssembly assembly, String[] args)
   at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Threading.ThreadHelper.ThreadStart()

This led me down a little side road of trying to confirm that I had the PowerVR OpenGL ES library properly installed. After confirming it was, as far as I could tell, I stepped through the code in the debugger and got to the Factory constructor in opentk\Source\OpenTK\Platform\Factory.cs.

While stepping through I kept finding that Egl.Egl.IsSupported was false. That code for IsSupported is below.

Oh great, a swallowed exception, maybe that would tell me something helpful. Stepping through the code again and examining the exception showed me a System.BadImageFormatException.

System.BadImageFormatException: An attempt was made to load a program with an incorrect format. (Exception from HRESULT: 0x8007000B)
   at OpenTK.Platform.Egl.Egl.GetCurrentContext()
   at OpenTK.Platform.Egl.Egl.get_IsSupported() in opentk\Source\OpenTK\Platform\Egl\Egl.cs:line 316 

I'm sure some web searching would find it, but I've run into this problem in the past and remembered it had to do with incompatible binary formats. For example, if you have a .NET program compiled for x64 and try to load an 32bit x86 native DLL. I checked the OpenTK solution's build configuration, and indeed, it was set to "AnyCPU". I changed it to x86 and tried again and the above exception went away, but it still didn't work.

Next I got an entirely unhelpful System.Collections.Generic.KeyNotFoundException. With the below stack-trace.

System.Collections.Generic.KeyNotFoundException: The given key was not present in the dictionary.
   at System.Collections.Generic.Dictionary`2.get_Item(TKey key)
   at OpenTK.Graphics.GraphicsContext.get_CurrentContext() in opentk\Source\OpenTK\Graphics\GraphicsContext.cs:line 344
   at OpenTK.Graphics.GraphicsContext.LoadAll() in opentk\Source\OpenTK\Graphics\GraphicsContext.cs:line 512
   at OpenTK.GameWindow..ctor(Int32 width, Int32 height, GraphicsMode mode, String title, GameWindowFlags options, DisplayDevice device, Int32 major, Int32 minor, GraphicsContextFlags flags, IGraphicsContext sharedContext) in opentk\Source\OpenTK\GameWindow.cs:line 220
   at OpenTK.GameWindow..ctor(Int32 width, Int32 height, GraphicsMode mode, String title, GameWindowFlags options, DisplayDevice device, Int32 major, Int32 minor, GraphicsContextFlags flags) in opentk\Source\OpenTK\GameWindow.cs:line 180
   at Examples.Tutorial.SimpleES20Window..ctor(GraphicsContextFlags flags) in opentk\Source\Examples\OpenGLES\2.0\SimpleWindow20.cs:line 26
   at Examples.Tutorial.SimpleES20Window.Main() in opentk\Source\Examples\OpenGLES\2.0\SimpleWindow20.cs:line 112
   at System.AppDomain._nExecuteAssembly(RuntimeAssembly assembly, String[] args)
   at System.AppDomain.ExecuteAssembly(String assemblyFile, Evidence assemblySecurity, String[] args)
   at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()
   at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Threading.ThreadHelper.ThreadStart()

Well, for a pretty low-level media platform this isn't very encouraging as it doesn't describe anything about the real problem and is clearly happening deep in the library. After a lot of debugging I finally found the below bit of code in opentk\Source\OpenTK\Graphics\GraphicsContext.cs.

I missed it the first few times I was stepping through the code, but the comment says it all: Note: this property will not function correctly when both desktop and EGL contexts are available in the same process. This scenario is very unlikely to appear in practice. In my case, since I'm running on Windows, I do actually have both desktop and EGL available. I didn't feel like digging into the code to fix this so that both contexts could actually exist simultaneously, but I remembered the Factory class that's I'd looked into above and made an easier fix. I did this in opentk\Source\OpenTK\Platform\Factory.cs in the constructor. My updated version is below.

You can compare to the original version above, but basically I made it such that the library first tries to see if EGL is available, and if so, uses it. If it is not available then the desktop version is instantiated. You can see that the Default context isn't instantiated unless the EGL contexts don't work.

After all of this I could run the sample program and was rewarded with a 800x600 blue window. Woohoo! Ok, it's not that exciting, but at least it was running and I can now try to get my OpenGL ES 2.0 code working.

Next: Premultiplied alpha

2013-09-25

I'm Back

After more than 2.5 years I'm back. I'm going to try hard to get back into spending regular time on my game and actually get it published. So far I've spent about 15 hours just getting it back to the point it was when I last left it. A lot has changed since then, which is why it took so long to just get it working at all. Here're a few things that have changed and some details about what I've had to do to fix them.

  1. I've changed my home computer. I used to have a tower Windows PC and a Macbook Air for doing the iOS-specific development portions. I now have just a Macbook Pro Retina, which I run VMWare Fusion and Windows 8. I made this migration a while ago, but finally getting my development environment working well took some work.
  2. OpenTK seems to have made some changes, which I don't fully understand since their release date is 2010-10-06 and should have been before I took a hiatus. See more details below.
  3. In the interim 2.5 years both XCode and Xamarin MonoDevelop have evolved and new versions have been released that required more updates. Xamarin's product has been renamed to Xamarin Studio and now has some interesting integration with Visual Studio, which hopefully will make development a bit easier. See more detail below.

OpenTK Issues

When I tried to run my game on Windows using OpenTK I got a PInvokeStackImbalance exception saying: This is likely because the managed PInvoke signature does not match the unmanaged target signature. Check that the calling convention and parameters of the PInvoke signature match the target unmanaged signature. Since I was running against the release build of OpenTK, which doesn't contain debug symbols this error occurred in my GLView class' constructor. It is an implementation of OpenTK's GameWindow. After compiling a debug build of OpenTK, I eventually traced this to the function GetProcAddress in Egl.cs, which is a PInvoke call.

The problem is the EntryPoint should be eglGetProcAddress, not eglCopyBuffers. After changing this and rebuilding OpenTK it worked fine. Doing some searching on the net shows this has been fixed in GitHub, but no new official build containing the fix has been released. It was an easy fix, but a bit disappointing that OpenTK's current official release has such a basic bug in it.

Beyond this, it seems that the latest official OpenTK release differs from what is released with MonoTouch. I'm still not sure of the extend of these differences, but I may write more on this later once I get a better understanding.

Xamarin and Visual Studio

As I discussed in a previous post, I've been using Visual Studio as my primary development environment and only switching to MonoDevelop Xamarin Studio when I wanted to actually deploy my code to my iPhone. Apparently others also are interested in doing this because Xamarin has integrated their product with Visual Studio; you can read more about this on their site. Because they're now providing .NET DLLs that can be used in Visual Studio to compile iOS .NET apps, it means I can simplify one of the issues I had with maintaining and building my code.

I have my game broken into 4 projects, each of which builds into a separate binary. One of these, which I call "Launcher", is the most device dependent, and references device specific libraries. This project can't actually be shared across platforms because the code is too different, so I have both a Launcher.iOS project and a Launcher.PC (in the future I hope to add Launcher.Android and Launcher.WinPhone). For the other three projects I also need separate project files, but 100% of the code is shared. This leaves me with this folder hierarchy:

  • Zoing
    • Launcher.iOS
      • Launcher.iOS.csproj (project file that references iOS specific libraries)
      • ...iOS specific code files...
    • Launcher.PC
      • Launcher.PC.csproj (project file that references PC specific libraries)
      • ...PC specific code files...
    • MediaFramework
      • MediaFramework.iOS.csproj (project files that references MonoTouch .NET libraries)
      • MediaFramework.PC.csproj (project files that reference Windows .NET libraries)
      • ...device independent C# files included in both project files...
    • OpenTKConnector
      • OpenTKConnector.iOS.csproj (project files that references MonoTouch .NET libraries)
      • OpenTKConnector.PC.csproj (project files that reference Windows .NET libraries)
      • ...device independent C# files included in both project files...
    • Zoing
      • Zoing.iOS.csproj (project files that references MonoTouch .NET libraries)
      • Zoing.PC.csproj (project files that reference Windows .NET libraries)
      • ...device independent C# files included in both project files...

MediaFramework is a set of device and framework independent classes that define the building blocks for my game. This includes things like: Audio/AudioSample, Device/Orientation, Graphics/Texture, Graphics/TextureArea, Graphics/MutableShape, Graphics/Shape, Graphics/Size, etc.

OpenTKConnector is a concrete implementation of the MediaFramework abstract classes using OpenTK (OpenAL and OpenGL). For example there is OpenGLES11/OpenGLES11Texture and OpenGLES20/OpenGLES20Texture, both of which implement MediaFramework/Graphics/Texture. Similarly there is OpenAL/OpenALAudioSample, which implements MediaFramework/Audio/AudioSample.

The benefit of having this separation is that my Zoing game just needs to know about the abstract classes in MediaFramework and complete ignores the platform details. For example, AudioSample has a Play method, which plays the sample. That is common for any platforms I will eventually support.


Anyway, restructuring my solution to have the project files arranged as above took a bit of time. I had to recreate the iOS project files to reference the MonoTouch library properly. I also had to move these files around, which is made a little more difficult with version control. Some of this meant examining and tweaking the project files' XML by hand.

This reminds me of one small bug in Visual Studio 2010 that caused me some confusion until I understood it. When you unload a project, then edit the project's .csproj XML file, then reload the project, it seems that Visual Studio doesn't recognize the manual changes you made. To fix this simple unload it again, then reload it again. This is a bit annoying, but once you know about it the workaround is simple.

Next: OpenGL and OpenTK Fun

2011-01-06

More Optimizations

In my previous post I discussed some challenges and interesting compromises I had to make to simulate physics in my game as much as I needed given how I was trying to make the game behave in ways that conflict with the real world. In this post I revisit optimization and discuss some recent work I've put into that area.

Introduction

I'll start with a brief explanation of why it has been so long since my last entry. Partly this is because I've been busy with other things and the recent holidays have also taken up some of my time, but the main reason is that this round of optimizations has been quite a bit more complex than I anticipated. Interestingly, it hasn't been the actual code changes that have been challenging, it has been my efforts to carefully analyze my code's slow areas and fully understand the optimizations I've made.

In some ways this is related to my previous post on optimizations. In that post I left with a conclusion that you need to be careful when optimizing to make sure you understand if and how the changes are really improving performance. To that end I tried to be more diligent in this round of optimizations.

My initial goal in this round was to optimize my game's calculations when advancing the game state frame by frame. I'm currently targeting 30 FPS, but would like that to be as efficient as possible in order to preserve battery life. Also, even though the iPhone can do graphics up to 60 FPS, I examined what my game does and found that there is very little perceivable visual improvement moving from 30 FPS to 60 FPS, which is why I decided to target the former.

To be able to properly analyze the performance of my code's calculations, as opposed to time taken by the actual graphics chip's rendering, I created a benchmark mode for my game that does all calculations as normal, but doesn't do any of the OpenGL API calls. I also do this outside of the top-level game loop, since the OpenTK framework imposes its own FPS limitations in this loop.

Sample Size

I should mention that all of this optimization work was done on my PC in Visual Studio. Since my goal is to optimize my algorithms, I'm working on the assumption that the improvements will also be reflected in code when running on an actual iPhone.

When I started testing this benchmark mode I tried running for a few thousand frames. I got results around 1200 FPS. I initially assumed this number of frames was enough to produce useful results and started the process of running under various conditions both for generating performance statistics to measure improvements and for running within VS' performance analysis tools to see what areas of the code were slowest and the best target for optimization efforts.

The problem I found was that the performance differed wildly from run to run. I ended up wasting quite a lot of time doing analysis with these too-short runs that produced inconsistent results. I finally found that the only real way to get more consistent results was to simply increase the number of frames to render and therefore the test time. Eventually I settled on running for 100,000 frames, or about one minute. At this number of frames I found that I could get multiple runs to finish within a few milliseconds of one another, which, over 1 minute, is quite consistent.

First Optimization - Math

Using this sample size, my baseline analysis produced a run in 78.05 seconds at 1281.23 FPS. I then ran this through the VS performance analysis tools. This revealed a lot of information, but I decided to start with one particular method I though I could quickly optimize.

I have an object called Shape. This represents a set of vertices that are connected to define a 2D shape. So far I'm using it for triangles or quads only, but it is flexible up to any number of points. This object also supports some transform operations like moving it in the 2D plane, resizing it, and rotating it (both around it's own center point as well as some other point). The way I've implemented these transform operations is to, at the time they're called, only remember what the operations are, but not to actually do the transform calculations until needed. This is to optimize when multiple transforms occur at the same time, such as an offset, resize, and rotation.

When the final transformed shape is needed I have a method called MakeClean that, if needed, performs the calculations. When I did my first performance analysis I found that this method was taking 20.8% of total time with these details:

I rewrote this to remove temporary object creation (actually Vector2 is a struct, not an object, but same idea), reduce repeated property accesses, combine multiple calculations, etc. This reduced the time to 9.8% and produced these details:

The result of this was the 100,000 frames finished in 68.9 seconds at 1451.38 FPS. That's 88% of the original time, and 113% of the original FPS.

Second Optimization - LINQ

After the above optimization I did another performance analysis in VS and received this summary report:

I was not surprised to see the LINQ entry there since when I was originally writing that area of the code I knew it wasn't particularly efficient. At the time, however, I wrote it as simply as possible and decided I would optimize it if necessary later. Well, now it's later and time to optimize.

For this area, I can't do any simple localized calculation optimizations, however. That LINQ code is all in the .NET framework and out of my control. In the VS performance analyzer, if I dig into that method, no source code is shown at all, since it is part of the core libraries. In fact, trying to determine where this code is called from is slightly complicated because VS shows this only indirectly. It shows the caller of this LINQ as:

But, in this there's no ConcatIterator to be seen anywhere. Fortunately, it isn't too hard to guess that it is somewhere in ElementsToRender. The code for that initially was:

Hah, see, I told you I wasn't surprised to see this as taking a lot of time. I even wrote a comment to myself about it. One thing the VS performance analysis doesn't show very well is why this is slow. Although this code clearly has a lot of stringed-together Concat calls, I suspect that the embedded Cast operations are also quite a performance hit.

The reason my original code was like this is because I'm developing in .NET 3.5, which does not support covariance and contravariance and therefore doesn't allow casting collections of subclasses into collections of common supperclasses. This is supported in .NET 4.0, however.

My solution to this was to create simple collection classes that hold my GameElement objects, and that can provide those objects both in their subclass typed state as well as their superclass form, and do this efficiently. The basic way I do this is to have have the collections detect when they are modified (added to, removed from, etc) and mark themselves as dirty in that case. Then, when I request the collection in the superclass form, that list is generated and cached, if necessary. This allows me to reduce the above code to this:

At a different area in the code I do a SelectMany to combine all of these collections together. The result of this was the 100,000 frames finished in 55.59 seconds at 1799.04 FPS. That's 81% of the previous time, and 124% of the original FPS.

Third Optimization

I also spent a bit of time on a third optimization. The end result of that was no performance improvement at all, in fact, it was 3% slower that the previous version. I won't describe this optimization in detail now, and I'm actually keeping the slower version for now. The reason is that I think it lays a good foundation for possibly making a more significant optimization in the future, and in some ways is cleaner that the earlier version. If I do such an optimization later I'll then describe it more.

Summary

The end result of all of the above was that my original performance of 78.05 seconds at 1281.23 FPS was improved to 57.08 seconds at 1752.04 FPS. This is 73% of the original time and 137% of the original FPS.

All in all I'm pretty happy with that result. I'm now able to continue feature development and will hopefully be able to blog more regularly again.

Next: I'm Back

2010-12-08

Phunky Physics

In my previous post I discussed some issues I ran into running my .NET code in the MonoTouch environment. In this post I'm switching gears a bit and will discuss an issue I've been struggling with for a while related physics simulation in my game.


Before Diagonal

After Diagonal

90° in the Shade

The issue came up because I was trying to make some of my physics interactions behave different from how they do in reality. You would think that bending reality or completely changing how physics works should be possible in a computer simulation's implementation, but I found out that changing how things work quickly leads to unintended problems. I wonder if this says something about the necessity of how physics actually works in real life.

If you look at the "Before Diagonal" image on the right, you see a white ball heading towards a collision with a blue diagonal. Then, in "After Diagonal", you can see the white ball after it has bounced off of the blue diagonal. For the purposes of my game the white ball must always be exactly halfway between a row or column of purple stars. Before the bounce it is travelling horizontally, and is halfway vertically in a row. After it bounces it is travelling vertically, and is halfway horizontally in a column. The problem is that in an actual 90° bounce like this, with a ball that has a non-zero radius, it would not bounce from halfway-vertical-in-row to halfway-horizontal-in-column like this. The lower part of leading edge of the ball would hit the diagonal first, before the halfway point, and then the ball would bounce upwards, but to the left of the halfway-horizontal-in-column point.

I initially tried to resolve this problem by simply teleporting the ball from the one halfway point to the next halfway point when the bounce occurred. This looked fine, and seemed to work ok, until I added interactions between multiple balls. The problem was that if there was already another ball in the place to where the first ball is teleported, then they would become overlapped and not properly bounce off of one another.

I solved this problem by not doing the teleportation and having the ball bounce as if it has a zero radius. This works fine for my purposes, but has another small problem. For the skin shown in the images to the right, the ball is fuzzy and the diagonal is lightning-like, so having the ball overlap the diagonal when it bounces looks fine. For an alternative skin, though, where the ball is completely solid and the diagonal is more physical-looking, it will not look ok. My plan to solve this is to have the interaction calculation done as if the radius is zero, but have the rendering adjust the ball's visible position so it doesn't overlap the diagonal.


Before Collision

After Collision

Collision!

Billiards this is Not

After resolving the above issue I then started working on ball-to-ball collisions. When the two balls are on the same row or column this is easy. The issue is when they are moving perpendicular to one another.

If you look an the "Before Collision" image to the right you can see that the tow balls are heading towards one another and will collide in the lower-left area of the image. If these were actual balls, like billiard balls, then the result of the collision is that the one moving down would head left and the one moving left would head down. The angle each ball changes to would depend on their speed and mass. My problem with implementing this type of collision is the same as that I described above related to bouncing off of diagonals. Basically, I want the balls to remain exactly in the halfway point between a row or a column. In this scenario, though, that would not happen.

For ball-to-ball collisions I could not use the same strategy I used above for diagonal collisions; calculating as if the ball had a zero radius. I actually initially tried this, and it worked ok for two balls, but once more than two were involved it became possible that all of the balls come to be in exactly the same position and could never resolve their collisions. Also, adjusting the rendering of the overlapped balls so that the don't appear to be overlapping is more difficult.

The solution I eventually came up with was to have the balls simply bounce and reverse directions entirely (see "After Collision" image). As I said, in real physics, that would not happen, and I wanted to make it look reasonable. To attempt do that I added a graphic element that is shown when the collision occurs, and implies that something more than just a normal real-life, ball-to-ball collision is happening. Currently I'm using the graphic shown in the "Collision!" image, but this is tentative, and also can be customized for each skin. Even though the physics of this collision isn't even close to real physics, it actually works pretty well in the game.

Summary

Resolving both of the problems described above took quite a while. I experimented with a number of different solutions before settling on the current one. I also tried to get things working in the early stages when I had the balls teleporting during a diagonal bounce. This seemed we work for some cases, but was never reliable. I find it interesting that even the small adjustment to physical rules I was trying to make in having the balls bounce and maintain their position halfway in a row or column caused problems that were basically impossible to solve. Of course the other side if this is that even my final solution goes against how physics works in real life.

One other thing that makes me feel pretty comfortable with my current solution is that it ended up in much cleaner code. In other strategies I always had a number of special cases, that made the code messy. Once I changed to my current solution they went away.

Next time: More Optimizations.

2010-11-22

More MonoTouch Gotchas

In my previous post I took a break about discussing my adventure developing for the iPhone with .NET and discussed some issues I've had blogging with Blogger. In this post I get back to the main of subject and describe an odd issue I've encountered with MonoTouch.

The issue is due to the way that MonoTouch compiles to native iPhone binaries. It does not support JIT compilation, which a "normal" .NET environment does support. In most cases this is fine, but sometimes the compiler cannot fully detect the types that are needed and therefore doesn't compile them into the binary. Then, during runtime, when it detects it needs a type that was not compiled it crashes.

The code below is a pretty simple example of this issue:
/// <summary>Initializes a new instance of the <see cref="Shape"/> class.</summary>
/// <param name="points">The points.</param>
public Shape(Vector2d[] points)
     : this(points.Select(point => new Vector2((float)point.X, (float)point.Y)).ToArray())
{
}

This is a very simple alternate constructor for my Shape class. It is a convenience constructor to allow passing in Vector2d values (these are double based vectors) instead of Vector2 values (these are float based). The LINQ expression just enumerates over the array and converts the values.

So, why doesn't this work? Well, the error I get is something like:
Attempting to JIT compile method
'System.Linq.Enumerable/PredicateOf`1<OpenTK.Vector2d>:.cctor
()' while running with --aot-only.

I believe what this is saying is that the constructor for System.Linq.Enumerable/PredicateOf`1<OpenTK.Vector2d> is not found, and since JIT isn't supported it can't compile it at runtime. The reason it isn't compiled initially is because the compiler doesn't recognize that the LINQ expression will ultimately need this constructor. The way around this is it reference the needed type explicitly in code. Unfortunately, I didn't figure out a way to do that. Fortunately, I found a different workaround, although it does include a little bit of unnecessary overhead:

/// <summary>Initializes a new instance of the <see cref="Shape"/> class.</summary>
/// <param name="points">The points.</param>
public Shape(Vector2d[] points)
     : this(new List<Vector2d>(points).Select(point => new Vector2((float)point.X, (float)point.Y)).ToArray())
{
}

In this case, the LINQ expression is over the List<Vector2d>, which works fine. I believe the reason this works is because List<Vector2d> explicitly implements System.Linq.Enumerable/PredicateOf`1<OpenTK.Vector2d>, where the array does not.

I need to play around with this a bit more to see how I can fix it without the List wrapper. The problem is this only shows up when I deploy to the iPhone, so the build/run/test cycle is a bit slower. For now my workaround works ok, so improving this is not a high priority.

Next time: Phunky Physics.

2010-11-18

Fighting with Blogger

In my previous post I described some work I spent optimizing my .NET code to make it perform well on an iPhone. I expect I'll have a follow up post when I do more optimizations in the future, but next I want to take a slight detour from coding to discuss a small battle I had with getting Blogger to work well with the type of info I'm posting.

Before getting into that discussion I want briefly give an update on the game. These blog posts are lagging actual development by a couple months. Basically it takes me a while to finish a blog entry, so I try to queue them up and revise them when I have time. So, in reality the game is coming along pretty well. It is still slow going because I can only put in a few hours here and there, but it is progressing nicely.

Now, on to fighting with Blogger...

So, as mentioned above, I wanted to be able to post source code that looks good and has syntax highlighting. Initially I tried a simple solution that I know worked to some degrees from my previous experience, simply copy the code from Visual Studio (VS) and paste into Blogger. Oops, no, that doesn't quite work. If you do that no formatting is included. But, if you copy and paste into Word then it is included. Hmm, ok, how about copying and pasting into Word, then re-copying from Word and pasting into Blogger? Yep, that works, and gives output like below:
/// <summary>
/// Binds the texture.
/// </summary>
/// <param name="textureData">The texture data.</param>
/// <param name="width">The width.</param>
/// <param name="height">The height.</param>
/// <returns>The texture's bound id.</returns>
private static uint BindTexture(byte[] textureData, int width, int height)
{
     uint textureId = 0;
     GL.GenTextures(1, ref textureId);
     GL.BindTexture(All.Texture2D, textureId);
     All format = (System.Environment.OSVersion.Platform == PlatformID.Win32NT) ? All.Bgra : All.Rgba; // useless comment placed here just to make the line very long
     GL.TexImage2D(All.Texture2D, 0, (int)format, width, height, 0, format, All.UnsignedByte, textureData);
     textureData = null;
     GL.TexParameter(All.Texture2D, All.TextureMinFilter, (float)All.Linear);
     GL.TexParameter(All.Texture2D, All.TextureMagFilter, (float)All.Linear);
     GL.Enable(All.Texture2D);
     return textureId;
}

Actually, the above is wrapped in a pre, which I have styled to make the grey background and such. But, there's still a problem, the long lines wrap and make it harder to read. So, what I really want is for the code area to scroll horizontally when the lines are too long. Fortunately, a bit of CSS can do that:

.csharpcode
{
  font-family: Consolas, "Courier New", Courier, Monospace;
  margin: 0em;
  padding: 0.5em;
  border: solid 0.1em #000000;
  background: #222222;
  white-space: pre;
  white-space: nowrap;
  overflow: auto;
  overflow-y: hidden;
}

So, if I change the simple pre to pre class="csharpcode" then I get the following instead. Notice the horizontal scrollbar and the long lines no longer wrap.
/// <summary>
/// Binds the texture.
/// </summary>
/// <param name="textureData">The texture data.</param>
/// <param name="width">The width.</param>
/// <param name="height">The height.</param>
/// <returns>The texture's bound id.</returns>
private static uint BindTexture(byte[] textureData, int width, int height)
{
     uint textureId = 0;
     GL.GenTextures(1, ref textureId);
     GL.BindTexture(All.Texture2D, textureId);
     All format = (System.Environment.OSVersion.Platform == PlatformID.Win32NT) ? All.Bgra : All.Rgba; // useless comment placed here just to make the line very long
     GL.TexImage2D(All.Texture2D, 0, (int)format, width, height, 0, format, All.UnsignedByte, textureData);
     textureData = null;
     GL.TexParameter(All.Texture2D, All.TextureMinFilter, (float)All.Linear);
     GL.TexParameter(All.Texture2D, All.TextureMagFilter, (float)All.Linear);
     GL.Enable(All.Texture2D);
     return textureId;
}

The only remaining problem is that this HTML that comes from VS via Word is very verbose. Just the very first line, a comment, contains all of this HTML:
<div class="MsoNormal" style="line-height: normal; margin: 0in 0in 0pt; mso-layout-grid-align: none;"><span style="color: grey; font-family: Consolas; font-size: 12pt;">///</span><span style="color: green; font-family: Consolas; font-size: 12pt;"> </span><span style="color: grey; font-family: Consolas; font-size: 12pt;">&lt;summary&gt;</span><span style="font-family: Consolas; font-size: 12pt;"></span></div>

So, the entire code block ends up being quite a lot of HTML.

To address this issue I actually jumped through a number of other hoops to create less verbose HTML. In fact, the earlier pages in this blog use that simpler HTML. But, going forward, I've decided to use the above method. It is easier and creates more nicely formatted output. It is a bit more verbose, but then if you look at all of the other CSS that Blogger includes the pages are pretty heavy to start with.

I actually might go back and reformat the previous pages at some point, so the above statement might become inaccurate.

Anyway, I guess my fight with Blogger is a lot less interesting that I thought. As I was figuring out how to do all of the above it was annoyingly painful. But not that I have a solution it is pretty simple.

Next time: More MonoTouch Gotchas.

2010-10-24

Optimizing .NET on the iPhone


Early Splash Screen

After working exclusively on my PC for a while adding functionality I decided to deploy to my iPhone to make sure everything still works well. When I did that I was happy to see that it did run correctly, but not so happy to see pretty poor performance.

In the game's splash screen I was only getting about 10 FPS, even though I'm shooting for 30. The splash screen is ironically more computationally intensive than a real game screen because of the grid resolution. For a real game screen the board is around 10x15 squares, depending on the level. For the splash screen, currently, it is 20x30 squares. That a difference from 176 Poles on the game screen to 651 on the splash screen. Clearly I needed to either optimize a lot or reduce the complexity of the splash screen. On the other hand, if I could get the splash screen performing well, then that would be a good baseline such that if it worked well on a particular device then the real game screens should be just fine.

Because of my platform independent abstraction efforts I described in an earlier post, I have access to the sophisticated profiling tools in Visual Studio (VS) and can quickly find the areas of code that are taking the most time. I realize that Xcode on Mac includes the Instruments tool, but so far I haven't been able to get it to work well with my MonoTouch generated code so that no symbol names are displayed. Only function addresses are shown, so it is nearly useless for this type of investigation. I've reached out to the MonoTouch community and hopefully there is a way to get symbols displayed, but for now I'll rely on VS.

Too Many Objects

One area I suspected might become a problem even when I was first creating it was my Triangle object. It contained 3 Vector2 objects, one for each vertex of the triangle. It also has a number of manipulation methods (rotate, scale, etc) which calculate new vertices and replace the old ones with the new ones.

The problem is that the game does these triangle manipulations for almost every single graphic element for every frame. In the case of the splash screen thats 651 x 3 x 30 = 58590 new Vertex2 objects created every second, and that's just for the poles on the screen, there are many other elements as well. Although the .NET memory management system is quite efficient, this is still a lot of overhead that is completely unnecessary if I change how my Triangle works. Which is exactly what I did...

Instead of using Vertex2 objects for my triangle's points, I changed them to 6 float values. The 6 values are the X and Y values for each of the 3 vertices. I used float because that is what I later pass to OpenGL and want to avoid the overhead of casting.

Unfortunately, the result of this rewrite was that there was not measurable performance gain in either CPU usage or memory allocation. How can that be? Well, it turns out that Vector2 is a struct, not an object. This means that it is not allocated and garbage collected in the same way and creating new instances of it is quite light weight. Although I should have checked that Vertex2 is a class before embarking on this optimization it didn't take that much time and I learned a bit.

Array Bounds Checking

OpenGL requires an array of float values in GL.VertexPointer and GL.TexCoordPointer. To populate this array I go through each of my graphic elements and add its vertices to the end of the array. In .NET there is automatic bounds checking performed on arrays to ensure you do not access memory outside of the array. This bounds checking is great for safety, but there is a performance penalty associated with it.

To see if it was significant for me, I just wrapped the area of code that populates the array with an unsafe { ... } block. After doing this that function went from about 26% total CPU usage down to about 23%. It's only a 3% improvement, but it's a very simple one. Since I carefully calculate my array index in this routine I'm confident the lack of bounds checking will be ok in this case.

Too Many Function Calls

Initially when I populated the float arrays for OpenGL I was calling a property for each vertex in my triangle one at a time. I changed this so that I pass in an array and starting index into the triangle and it populates it. This reduced the number of property accesses by 6 times.

Unfortunately I didn't profile this change carefully so I can't say how much of an affect it had, but I imagine it would have been measurable reducing more than 60,000 property calls to closer to 10,000 method calls.

Too Many Calculations

My triangle object remembers its original state (position, rotation, scale, etc) so that I can manipulate it based on that state. A side affect of this is that when I want to get the actual vertices I need to calculate them based on its original state and the current transformations. Initially I was doing this every time I needed the vertices, even if its state or transformations hadn't changed.

To fix this I added a boolean flag, isDirty that I use to keep track of it state. If its state changes then I set the flag to true and then next time I need the vertices I calculate them only if that flag is set to true and then set it to false.

This optimization reduced these calculations from about 29% of CPU time to 11%. Now, finally, a nice gain.

Too Many Draws

After all of the above investigations and optimizations, it turned out the biggest improvement I made was from a stupid mistake.

I refactored the class that acts as the main view manager in order to better support different views; a game view and a splash screen view in this case. In doing that refactor I reduced the two-phase update-then-draw functionality into one-phase update-and-draw. In my case there's no real benefit to either case, and the single phase seemed a bit simpler. When I did this, however, the main render loop, which is one of the platform-dependent classes, still called both phases. As part of the refactor the update method became renamed to the draw one. The consequence of this was that on the iPhone version of the render loop, the draw method was called twice, once when it used to be for update and then again for the draw. Since I did the refactor on my PC I tested it there and must have fixed this problem for the PC version of the render loop. When I went to test the iPhone version, however, it was still doing this double work.

Anyway, when I found the problem and removed the redundancy there was a significant performance improvement. That, along with my other optimizations, means the game is back to running my targeted 30 FPS, even in the more complex splash screen. When it's doing that it's eating about 70% CPU, and I'd really like to bring that down for battery life and to better support older devices. I do have one more significant idea based on data from the VS profiler, but I haven't gotten to it. If it works well I'm hoping for about a 10% gain in efficiency. If and when I do another pass as optimizations I'll write about them here.

Next time: Fighting with Blogger.