2010-10-24

Optimizing .NET on the iPhone


Early Splash Screen

After working exclusively on my PC for a while adding functionality I decided to deploy to my iPhone to make sure everything still works well. When I did that I was happy to see that it did run correctly, but not so happy to see pretty poor performance.

In the game's splash screen I was only getting about 10 FPS, even though I'm shooting for 30. The splash screen is ironically more computationally intensive than a real game screen because of the grid resolution. For a real game screen the board is around 10x15 squares, depending on the level. For the splash screen, currently, it is 20x30 squares. That a difference from 176 Poles on the game screen to 651 on the splash screen. Clearly I needed to either optimize a lot or reduce the complexity of the splash screen. On the other hand, if I could get the splash screen performing well, then that would be a good baseline such that if it worked well on a particular device then the real game screens should be just fine.

Because of my platform independent abstraction efforts I described in an earlier post, I have access to the sophisticated profiling tools in Visual Studio (VS) and can quickly find the areas of code that are taking the most time. I realize that Xcode on Mac includes the Instruments tool, but so far I haven't been able to get it to work well with my MonoTouch generated code so that no symbol names are displayed. Only function addresses are shown, so it is nearly useless for this type of investigation. I've reached out to the MonoTouch community and hopefully there is a way to get symbols displayed, but for now I'll rely on VS.

Too Many Objects

One area I suspected might become a problem even when I was first creating it was my Triangle object. It contained 3 Vector2 objects, one for each vertex of the triangle. It also has a number of manipulation methods (rotate, scale, etc) which calculate new vertices and replace the old ones with the new ones.

The problem is that the game does these triangle manipulations for almost every single graphic element for every frame. In the case of the splash screen thats 651 x 3 x 30 = 58590 new Vertex2 objects created every second, and that's just for the poles on the screen, there are many other elements as well. Although the .NET memory management system is quite efficient, this is still a lot of overhead that is completely unnecessary if I change how my Triangle works. Which is exactly what I did...

Instead of using Vertex2 objects for my triangle's points, I changed them to 6 float values. The 6 values are the X and Y values for each of the 3 vertices. I used float because that is what I later pass to OpenGL and want to avoid the overhead of casting.

Unfortunately, the result of this rewrite was that there was not measurable performance gain in either CPU usage or memory allocation. How can that be? Well, it turns out that Vector2 is a struct, not an object. This means that it is not allocated and garbage collected in the same way and creating new instances of it is quite light weight. Although I should have checked that Vertex2 is a class before embarking on this optimization it didn't take that much time and I learned a bit.

Array Bounds Checking

OpenGL requires an array of float values in GL.VertexPointer and GL.TexCoordPointer. To populate this array I go through each of my graphic elements and add its vertices to the end of the array. In .NET there is automatic bounds checking performed on arrays to ensure you do not access memory outside of the array. This bounds checking is great for safety, but there is a performance penalty associated with it.

To see if it was significant for me, I just wrapped the area of code that populates the array with an unsafe { ... } block. After doing this that function went from about 26% total CPU usage down to about 23%. It's only a 3% improvement, but it's a very simple one. Since I carefully calculate my array index in this routine I'm confident the lack of bounds checking will be ok in this case.

Too Many Function Calls

Initially when I populated the float arrays for OpenGL I was calling a property for each vertex in my triangle one at a time. I changed this so that I pass in an array and starting index into the triangle and it populates it. This reduced the number of property accesses by 6 times.

Unfortunately I didn't profile this change carefully so I can't say how much of an affect it had, but I imagine it would have been measurable reducing more than 60,000 property calls to closer to 10,000 method calls.

Too Many Calculations

My triangle object remembers its original state (position, rotation, scale, etc) so that I can manipulate it based on that state. A side affect of this is that when I want to get the actual vertices I need to calculate them based on its original state and the current transformations. Initially I was doing this every time I needed the vertices, even if its state or transformations hadn't changed.

To fix this I added a boolean flag, isDirty that I use to keep track of it state. If its state changes then I set the flag to true and then next time I need the vertices I calculate them only if that flag is set to true and then set it to false.

This optimization reduced these calculations from about 29% of CPU time to 11%. Now, finally, a nice gain.

Too Many Draws

After all of the above investigations and optimizations, it turned out the biggest improvement I made was from a stupid mistake.

I refactored the class that acts as the main view manager in order to better support different views; a game view and a splash screen view in this case. In doing that refactor I reduced the two-phase update-then-draw functionality into one-phase update-and-draw. In my case there's no real benefit to either case, and the single phase seemed a bit simpler. When I did this, however, the main render loop, which is one of the platform-dependent classes, still called both phases. As part of the refactor the update method became renamed to the draw one. The consequence of this was that on the iPhone version of the render loop, the draw method was called twice, once when it used to be for update and then again for the draw. Since I did the refactor on my PC I tested it there and must have fixed this problem for the PC version of the render loop. When I went to test the iPhone version, however, it was still doing this double work.

Anyway, when I found the problem and removed the redundancy there was a significant performance improvement. That, along with my other optimizations, means the game is back to running my targeted 30 FPS, even in the more complex splash screen. When it's doing that it's eating about 70% CPU, and I'd really like to bring that down for battery life and to better support older devices. I do have one more significant idea based on data from the VS profiler, but I haven't gotten to it. If it works well I'm hoping for about a 10% gain in efficiency. If and when I do another pass as optimizations I'll write about them here.

Next time: Fighting with Blogger.

2010-10-21

Problems Deploying to iPhone

Sorry for the brief hiatus, I was on a business trip to Beijing for 1 week, but I'm back now and hope to get back into the swing of developing.

In my last post I discussed how I got OpenAL working on Windows. After that I wanted to confirm that all my platform independent abstractions worked correctly on the iPhone but going back to my Mac and building and deploying there. Unfortunately, when I did that, I ran into a few issues. This post discusses that experience.

The first error I received was mtouch failed with no output (1). After a bit of searching on the web I found a few hints about not allowing spaces in the output assembly name. I changed some settings to correct this, but the problem persisted.

I looked at the command being from by MonoTouch and noticed other areas in the command were not escaping or quoting some paths. Initially I was accessing my project over a Samba share directly to my Windows Documents directory. This meant that I was opening the project on my Mac from Documents\visual studio 2010\Projects\Zoing. Notice the spaces in "visual studio 2010".

To fix this I changed how I mounted the Samba share to directly mount the Zoing directory, thus skipping the areas with spaces. After doing this I could deploy to my iPhone. Joy!

Well, no, actually not. As soon as I ran the program I got this:
Tue Sep 28 23:46:25 unknown kernel[0] <Debug>: launchd[9744] Builtin profile: container (sandbox) Tue Sep 28 23:46:25 unknown kernel[0] <Debug>: launchd[9744] Container: /private/var/mobile/Applications/478863D8-A953-4D5A-84FD-AF8C096DC363 [69] (sandbox) Tue Sep 28 23:46:27 unknown UIKitApplication:launcher[0x114c][9744] <Notice>: Unhandled Exception: System.TypeInitializationException: An exception was thrown by the type initializer for System.Collections.Generic.EqualityComparer`1 ---> System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. ---> System.ExecutionEngineException: Attempting to JIT compile method 'System.Collections.Generic.GenericEqualityComparer`1<OpenTK.Vector2>:.ctor ()' while running with --aot-only.
Tue Sep 28 23:46:27 unknown UIKitApplication:launcher[0x114c][9744] <Notice>: at System.Reflection.MonoCMethod.Invoke (System.Object obj, BindingFlags invokeAttr, System.Reflection.Binder binder, System.Object[] parameters, System.Globalization.CultureInfo culture) [0x00000] in <filename unknown>:0
Tue Sep 28 23:46:27 unknown UIKitApplication:launcher[0x114c][9744] <Notice>: --- End of inner exception stack trace ---

The highlighted areas point to the important part. Basically, for some reason, MonoTouch, on the actual iPhone (it works fine in the simulator) does not support default EqualityComparers properly. Fortunately, I can pass in a IEqualityComparer<T> to the Dictionary constructor, like so:
_diagonals = new Dictionary<Vector2, Diagonal>(VectorComparer.Singleton);

And, my VectorComparer implementation is:
private class VectorComparer : IEqualityComparer<Vector2>
{
    private static VectorComparer _singleton = new VectorComparer();
    public static VectorComparer Singleton
    {
        get
        {
            return _singleton;
        }
    }
    public bool Equals(Vector2 x, Vector2 y)
    {
        return x.Equals(y);
    }
    public int GetHashCode(Vector2 obj)
    {
        return obj.GetHashCode();
    }
}

Currently, for my purposes, this implementation is fine because I know I'm only putting in Vector2 objects with values that are precise as a float. If I expect to want to compare Vector2 objects where, because of math rounding, the values may not be so precise I could adjust my comparisons to have some level of necessary precision to consider the values equal.

Next time: Optimizing .NET on the iPhone.

2010-10-07

OpenAL on Windows

In my previous post I discussed the graphics related word  I had to do to get my iPhone targeted game running on my PC. This was relatively easy to do and has improved my productivity significantly. Of course, my game contains sound as well, and it is important to be able to develop those aspects of it on my PC as well. This post discusses that.

From the beginning I decided to use OpenAL for sound in my game. The main reason is it provides relatively simple access to 3D positioned audio, which is something I wanted (actually, I really just am working 2D space). Getting this working on the iPhone initially in native Objective-C was pretty straight forward, but, as with OpenGL, OpenAL is not natively available on Windows. The OpenTK library that MonoTouch relies on has built in support for OpenAL, but it still needs an underlying library.

I searched around a bit and quickly found Creative Labs' OpenAL library for Windows. This installed easily, but I still needed to implement the layer to reads WAV files and converts them into a format that OpenAL understands.

In the iPhone version I do this to initialize OpenAL:
/// <summary>
/// As per the SDK:
/// <br/>
/// Your application must call this function before making any other Audio Session Services calls.
/// You may activate and deactivate your audio session as needed (see AudioSessionSetActive),
/// but should initialize it only once.
/// </summary>
public SoundManager()
{
    // setup our audio session
    AudioSession.Initialize();
    AudioSession.Category = AudioSessionCategory.AmbientSound;
    AudioSession.SetActive(true);
    AudioSession.Interrupted += HandleAudioSessionInterrupted;
    AudioSession.Resumed += HandleAudioSessionResumed;
    // TODO: Check if BGM is playing
    // UInt32 size = sizeof(iPodIsPlaying);
    // result = AudioSessionGetProperty(kAudioSessionProperty_OtherAudioIsPlaying, &size, &iPodIsPlaying);
    // if the iPod is playing, use the ambient category to mix with it
    // otherwise, use solo ambient to get the hardware for playing the app background track
    // UInt32 category = (iPodIsPlaying) ? kAudioSessionCategory_AmbientSound : kAudioSessionCategory_SoloAmbientSound;
    OpenALManager = new OpenALManager();
}

And then this to load CAF sound samples:
private static AudioFile GetAudioFile(string filename)
{
    if(!File.Exists(filename)) {
        throw new FileNotFoundException("Could not find sound file.", filename);
    }
    AudioFileType fileType;
    switch(Path.GetExtension(filename).ToUpper()) {
        case ".CAF":
            fileType = AudioFileType.CAF;
            break;
        default:
            throw new NotSupportedException("Audio files of type " + Path.GetExtension(filename) + " are not supported.");
    }
    using(CFUrl cfUrl = CFUrl.FromFile(filename)) {
        return AudioFile.Open(cfUrl, AudioFilePermission.Read, fileType);
    }
}
private static AudioData GetAudioData(AudioFile audioFile)
{
    // Set the client format to 16 bit signed integer (native-endian) data
    // Maintain the channel count and sample rate of the original source format
    AudioStreamBasicDescription outputFormat = new AudioStreamBasicDescription();
    outputFormat.SampleRate = audioFile.StreamBasicDescription.SampleRate;
    outputFormat.ChannelsPerFrame = audioFile.StreamBasicDescription.ChannelsPerFrame;
    outputFormat.Format = AudioFormatType.LinearPCM;
    outputFormat.BytesPerPacket = 2 * audioFile.StreamBasicDescription.ChannelsPerFrame;
    outputFormat.FramesPerPacket = 1;
    outputFormat.BytesPerFrame = 2 * audioFile.StreamBasicDescription.ChannelsPerFrame;
    outputFormat.BitsPerChannel = 16;
    outputFormat.FormatFlags = AudioFormatFlags.IsPacked | AudioFormatFlags.IsSignedInteger;
    // Set the desired client (output) data format
    bool hadError = audioFile.SetProperty(AudioFileProperty.DataFormat, Marshal.SizeOf(outputFormat), GCHandle.ToIntPtr(GCHandle.Alloc(outputFormat)));
    if(hadError) {
        throw new InvalidOperationException("Could not set output format.");
    }
    byte[] data = new byte[audioFile.Length];
    audioFile.Read(0, data, 0, (int)audioFile.Length, false);
    AudioData audioData = new AudioData(data, outputFormat.ChannelsPerFrame == 1 ? ALFormat.Mono16 : ALFormat.Stereo16, outputFormat.SampleRate);
    return audioData;
}

For the PC I had to do something similar, except for WAV files, not CAF:

using System;
using System.IO;
using OpenTK.Audio;
using OpenTK.Audio.OpenAL;
using Qythyx.OpenTKTools.Sound;
namespace Qythyx.Launcher_PC
{
    internal sealed class WaveReader
    {
        private static ALFormat GetALFormat(WaveData data)
        {
            switch(data.Channels)
            {
                case 1:
                if(data.BitsPerSample == 8)
                {
                return ALFormat.Mono8;
                }
                else if(data.BitsPerSample == 16)
                {
                return ALFormat.Mono16;
                }
                break;
                case 2:
                if(data.BitsPerSample == 8)
                {
                return ALFormat.Stereo8;
                }
                else if(data.BitsPerSample == 16)
                {
                return ALFormat.Stereo16;
                }
                break;
            }
            throw new AudioException("Unsupported audio format. Channels = " + data.Channels + ", Bits per Sample = " + data.BitsPerSample);
        }
        /// <summary>Reads and decodes the sound file.</summary>
        /// <param name="filename">The WAVE filename.</param>
        /// <returns>An <see cref="AudioData"/> object that contains the decoded data.</returns>
        public static AudioData ReadWave(string filename)
        {
            WaveData data = ReadWaveData(filename);
            return new AudioData(data.Data, GetALFormat(data), data.SampleRate);
        }
        private struct WaveData
        {
            public int RiffChunckSize;
            public int FormatChunkSize;
            public short AudioFormat;
            public short Channels;
            public int SampleRate;
            public int ByteRate;
            public short BlockAlign;
            public short BitsPerSample;
            public int DataChunkSize;
            public byte[] Data;
        }
        // Read the WAVE/RIFF headers.
        private static WaveData ReadWaveData(string filename)
        {
            using(Stream stream = new FileStream(filename, FileMode.Open, FileAccess.Read))
            {
                using(BinaryReader reader = new BinaryReader(stream))
                {
                WaveData data = new WaveData();
                // RIFF header
                if(new string(reader.ReadChars(4))!= "RIFF")
                {
                throw new FormatException("File is not recognized as valid WAVE format. Can't find RIFF signature.");
                }
                data.RiffChunckSize = reader.ReadInt32();
                if(new string(reader.ReadChars(4)) != "WAVE")
                {
                throw new FormatException("File is not recognized as valid WAVE format. Can't find expected WAVE format.");
                }
                // WAVE header
                if(new string(reader.ReadChars(4)) != "fmt ")
                {
                throw new FormatException("File is not recognized as valid WAVE format. Can't find 'fmt' marker.");
                }
                data.FormatChunkSize = reader.ReadInt32();
                data.AudioFormat = reader.ReadInt16();
                data.Channels = reader.ReadInt16();
                data.SampleRate = reader.ReadInt32();
                data.ByteRate = reader.ReadInt32();
                data.BlockAlign = reader.ReadInt16();
                data.BitsPerSample = reader.ReadInt16();
                while(reader.PeekChar() == 0)
                {
                reader.Read();
                }
                if(new string(reader.ReadChars(4)) != "data")
                {
                throw new FormatException("File is not recognized as valid WAVE format. Can't find data marker.");
                }
                data.DataChunkSize = reader.ReadInt32();
                data.Data = reader.ReadBytes((int)stream.Length);
                return data;
                }
            }
        }
    }
}

Well, after this OpenAL was happy on my PC and I've now got sound and graphics and I can develop happily in Visual Studio. Woohoo!

Next time: Problems deploying to iPhone.

2010-10-04

OpenGL ES 1.1 on Windows


Wrong Colors

Correct Colors
In my previous post I discussed how I separated my .NET solution into multiple projects with the goal of having the bulk of my code 100% platform independent. The initial goal of this was to allow me to easily develop with in Visual Studio (VS), but a second and important benefit is it sets me up well to port my game to other .NET platforms like Android and Windows Mobile 7. I also mentioned that the graphics framework I'm targeting is OpenGL ES version 1.1. Although iPhone 4 supports OpenGL ES 2.0, older iPhones do not and I want to support them as well.

To get the full benefits of developing in VS I not only want to be able to compile, but also run my game. To do that I need to be able to render using Open GL ES 1.1. The problem is that this is a version of OpenGL that is for mobile devices. Fortunately, there are a few emulators available for the PC.

After some web searching I found an emulator by from Mali, but when I tried to use it I kept getting this error: [Error] Failed to create EGL window surface, error 12293. I did a lot of searching and investigation to fix it, but was never able to. I did see one person describe that they were successful, but their suggestions weren't working for me, so I eventually gave up and tried to find another emulator.

Next I found this one from Khronos. It still had a few minor issues, but when I followed the steps described in this thread I got it to work. The thread said that they also managed to get the Mali emulator working, and I don't know why it didn't work for me.

One interesting thing I found with this emulator was that it isn't 100% compatible with the OpenGL ES 1.1 that runs on the iPhone. Specifically the internalFormat argument passed to GL.TexImage2D seems different. In my original iPhone version I passed All.Rgba for this, but when I ran this on the PC my textures' colors were strange. After a lot of investigation I finally discovered that changing this to All.Bgra fixes the problem. I currently have a hack to set this depending on the OS.

bool isWindows = System.Environment.OSVersion.Platform == PlatformID.Win32NT;
All format = isWindows ? All.Bgra : All.Rgba;
GL.TexImage2D(All.Texture2D, 0, (int)format, width, height, 0, format, All.UnsignedByte, textureData);

I'm not enough of a OpenGL expert to know which is actually correct. It will also be interesting to see how other platforms like Android and Windows Mobile 7 handle this. For now I'm just pretty happy that this was the only platform-dependent adjustment I've had to make.

Next time: OpenAL on Windows.