Making .NET web services use HTTP compression

So, there's two parts to making this work: configuring the server correctly, and coding the client correctly. Both parts are pretty simple, but there are a few gotchas.

First, on the server side, you need to enable HTTP compression and make sure all of the options work for your web services. In IIS 6 the GUI for configuring a web site does provide the ability to enable HTTP compression (it's under the Internet Information Service → Web Sites → Properties (right-click) → Service tab), but there is no way to set which file extensions are compressed. Microsoft has a page that explains how to set this stuff via the command line, but it doesn't seem to work quite right. I searched a bit and found some good explanations, which eventually solved my problem.

The only change I made to my C:\WINDOWS\system32\inetsrv\MetaBase.xml file was to make the HcScriptFileExtensions entries newline delimited instead of space delimited. The same goes for HcFileExtensions as well. Take care that you fix these for both deflate and gzip.

So, I now have aspx and asmx extensions configured for compression and when I hit a page from a browser it is downloaded compressed. Woohoo!

But, when my client program that uses my web services (asmx pages) hit the same pages things were not compressed. Boohoo!

The key is that if you're using automatically generated Web References that extend SoapHttpClientProtocol you need to set the EnableDecompression property to true. E.g.:

using(MyWebService proxy = new MyWebService()) {
    proxy.EnableDecompression = true;
One other thing... It's somewhat annoying that there's no easier way to determine if compression is working. Using a packet sniffer will work, but it can be a pain to install one, especially since you don't need most of its functionality. The way I checked this stuff was to examine the IIS logs and compare the download size of known pages before and after changing various settings. HTTP compression seems to reduce web text to about 25% of its original size. One annoying thing is that the IIS web log file is appended in a way such that the results are delayed by a minute or so. Also, in a live system this file can grow very quickly, so trying to do quick checks in it can be difficult.


Making Windows Server 2008 relay SMTP requests

By default, even if you install the SMTP server in Windows Server 2008 it seems the service is not enabled to start by default, and it is not configured to relay requests, which is good, including those from localhost, which is less good. Fortunately, this is easy to fix, and this article explains it well.
In case that article disappears someday, the simple version is you open the Administrative Tools → Internet Information Services (IIS) 6.0 Manager. In this tool the Access tab in the properties on the virtual server allow you to enable relaying for localhost (or whatever else you need).

Postgres shared_buffers Setting in Windows

Apparently Postgres on Windows doesn’t respond well to a high value for shared_buffers. This thread explains a little, but not definitively. Also, I found that setting the value to greater than 1GB (on a Windows Server 2008 x64 box with 8GB RAM) failed with this error in the event log:
%t FATAL: could not create shared memory segment: 8
%t DETAIL: Failed system call was MapViewOfFileEx.
So, for now I’m going with a conservative 32MB setting.


Profiling, optimizing, and carpentry

Ok, actually this post really isn't about carpentry, except that the old adage "measure twice, cut once" is equally important in optimizing code. Different than in carpentry, there are many variables when measuring code performance and that's what this post is about.

I'll start out with describing the particular incident that caused me to think about this. A few days ago I was working on a little hobby program that has an area that is frequently accessed and which makes up a significant portion of the program's total running time. Basically I have an object with a set of byte data and I was considering two ways to store that data: as a simple multi-dimensional byte or as a much small single dimension long (I'm running on a 64bit CPU) array that I would index into to extract single byte values.

I suspected that the long[] method would be slower since it is less direct that a straight byte[], but there's another piece to the puzzle. As well as accessing these byte values I have at other times the need to compare one set of values to another. My thinking was that if the long[] method was a little slower than the byte[] method in access that loss might be offset by a much faster comparison.

Anyway, on with the story... So I created a test case to try both methods a few million times and return performance metrics. Since my concern with the long[] method was that the calculations to index and extract a byte value from the long values might be a performance hit I decided to have my test case use random indexes to exercise indexing into various portions of the data.

I ran the tests and found the byte[] method to be marginally faster than the long[] method. The ratio was about 7:8. I must admit I was pleasantly surprised and suspected that the compiler might have made some pretty good optimizations to achieve this, but I was happy to accept the result.

I then preceded to implement the long[] method in the rest of my code to test its performance in a more real-world scenario. I actually decided to use a few #ifdef areas so that I could switch between the original byte[] method and long[] method to compare the two. This implementation took about an hour and when I finally ran some real benchmarks I found the long[] method to be about 1/5th as fast as the byte[] method.

1/5th?! The test cases showed a performance hit of 8:7, not 5:1. What's up with that? So, I went back to the performance test and considered the possibilities. To keep a short story short, the problem was the random index generation. The random generator time so much outweighed the data access time that my initial performance measurement results were drastically skewed.

Why would that be? Well, it works something like this. If the actual performance of just my data access code is 5,000 ops/sec for the byte[] and 1,000 ops/sec for the long[] method that gives me the 5:1 ratio the real world tests showed. If, however, the random generation adds overhead such that I get only 80 ops/sec for byte[] and 70 ops/sec for long[] then that gives the 7:8 ratio. In fact, after I removed the random index generation the performance test gave me expected 5:1 results.

So, the moral of the story is make sure you are measuring what you think you're measuring. What I probably should have done was run my test case through a profiler. That would have revealed that a significant portion of the test run time was the random number generation and it also should have shown numbers for the data access methods that were much closer to the actual 5:1 performance ratio.