« January 2004 | Main | March 2004 »

20 posts from February 2004

February 25, 2004

The Study of Video Games

The New York Times has a technology article about what happens when academics discover video games.

Ludology, the theory of video games, is the name of a new academic field that seeks to explore concepts behind video games. I glanced at Ludology.org and Ludonauts, and noticed some emerging terminology in the field such as agonistic integrity and epistemic conditions. Hmm... why do I have the feeling that the people who study video games are not the people who play them.

Correct Code Is Hard To Write

Even the simplest, most primitive data structures have special cases that make it difficult to find all bugs. Consider integers, for example.

How would you write a Compare function for two integers, that returns a negative value for less-than relationship, zero for equality, and a positive value for a greater-than relationship? If you write the following,

public void Compare(int i1, int i2)
{
   return i1-i2;
}

you would have a problem with extreme values (values with the 30th bit set), because Compare(int.MinValue, int.MaxValue) will incorrectly return 1 and Compare(int.MaxValue, int.MinValue) will incorrectly return -1 in unchecked mode. Using checked mode for overflow checking doesn't help either, because an exception should not have to be fired from this function. The correct approach is

public int Compare(int i1, int i2)
{
   return i1>i2 ? 1 : i1<i2 ? -1 : 0;
}

With floating-point numbers, the use of a sign bit eliminates the possibility of this problem. However, many floating point values are not comparable, in which case the result of the Compare will be zero.

Now, let's write an Abs function, which returns the absolute value of an integer. Sounds easy, right?

public int Abs(int n)
{
   return n>=0 ? n : -n;
}

The problem is that, with machine integers, 0 is not the only value that is the negation of itself. The int.MinValue is also the negation of itself, because there is no way to represent its positive counterpart. The correct implementation is the following:

public int Abs(int n)
{
   return n>=0 ? n : checked(-n);
}

I just wanted to point out that writing correct code is not trivial. One last thing, I recommend the use of arithmetic overflow checks for Debug builds, at least.

February 24, 2004

Chris Pratley Has Second Thoughts

Chris Pratley is pondering whether he should consider blogging. His blog is one of the cream of the crop, and it would be a major loss for him to go. Well written and informative. I look forward to reading each entry he posts. (I did know him for some years while at Office, but I have not spoken to him for several years.)

He feels a little cheapened that his work at Microsoft is boosting his traffic, and wonders if anyone actually cares for him personally. Well, I send in my vote.

A look at technorati shows that he has a rapidly developing fan base. He already is included in a number of blogrolls, despite having started blogging recently. One blogger, Tejas Peatel, calls Chris Pratley "my favourite blogger."

John Porcarp says this of Chris:

If you read one Microsoft Blog (okay, other than mine...), it's gotta be Chris Pratley's . I've commented to a couple of people that this is the kind of writing I wish I could (or should decide to) do more of. Chris is opening the window on why some of us make the decisions we do, and he has some very interesting insight...

KentC has this to say:

You undoubtedly have read a lot of posts on blogs.asp.net that contain code snippets, insight into Longhorn, et al but for my money there’s no more interesting reading than Chris Pratley’s account on the birth of OneNote .

So, Chris, if this is any evidence, the world wants you to keep on writing. You spend a lot of energy on your posts, and it shows, and the world will miss it, if you stop.

By the way, I am an ex-Microsoftee. I felt a little cheapened too by my Microsoft background. I am really blogging for myself, since it gives me an excuse to write and capture any thoughts that I may have.


PS: In case, you're reading this and thinking that I am being silly; I think so, too. I don't really think that Chris is going to stop blogging and I don't think this post will make a difference. I am just feeling bored and felt like writing something.

What Happens to Java/Linux When Longhorn Ships?

I remember reading a quote from a Linux enthusiast, dismissing the impact of Longhorn, stating that the features new in Longhorn essentially exists in Linux in its present form. XAML--oh, that's just XUL--and so on. WINFS--oh, that's just a database.

I feel that, if Longhorn does deliver in its promises, it will severely complicate the future of Linux and Java. If the whole Windows world becomes managed, won't Java performance suffer a bit with two runtimes, two libraries, two garbage collectors needed to run a Java application. Will Java based UIs appear rather dated compared to new Avalon applications, or will Sun be able to match the Longhorn UI look and experience as well as maintain the same UI consistency across multiple operating systems?

For a while, Linux had the advantage of being a more solidly built operating system than Windows 9X; I don't believe it still maintains an edge anymore with Windows XP and Mac OS X. Can an army of volunteers keep up and maintain the same discipline as a focused set of developers in Apple and Microsoft?

Avalon is set to have a far richer set of graphical capabilities and will make full use of the graphics processor; the idea, of course, is that the UI will become as rich or even richer than current video game titles. This advance is proceeding quickly through Microsoft's close collaborative work with hardware developers such as ATI. I think Mac OS X proved earlier that major advances can from come better integration between hardware and software. Linux does not have a good history with hardware support and integration, so it will probably fall behind.

Is there any activity by Linux developers towards a rich multimedia interface? I wonder if the decentralized nature of Linux might even hurt it in this next presentation wave. It's hard to see any advances in this area unless IBM and Sun (perhaps with its Looking Glass) comes along to help.

I am getting a sense of deja vu. When Microsoft announced its Internet plans back in 1996, I had difficulty figuring out how Netscape would fend off some challenges. IE, for instance, was available as a reusable ActiveX control--a feature that won over AOL as well as a number of ISVs. There was also ActiveX documents, the Inet API, HTML integration everywhere, although Active Desktop, however, never caught on and appears to have been abandoned in Longhorn. Microsoft had a multipronged strategy, coming in from all angles with IIS server, FrontPage, MSN, Office, VS, Explorer, and Windows API. Netscape seemed to focus on individual features, while Microsoft focused on linkages, understanding that the world cares more about how to connect different products together. While IE could be easily popped into any application, Netscape's browser was not embeddable for a long time. Well, sure enough, Netscape is no more.

I do think Linux will still be around, if, for any no other reason, than that it's free and open-source. I just don't feel that it will be in many desktops, since I feel that Longhorn will present a more compelling interface. (I'm not talking about the current alpha interface, which is essentially the same as XP; the neat stuff is still hidden from most of the developers at Microsoft, so as not to leak out.)

What's Missing In .NET and Other Collection Libraries

Collections are probably one of the most important components of any library. The framework's collection classes, as of Everett, are somewhat weak, especially compared with Java and STL. With the recent work with collections in Whidbey, much of the gap in functionality with Java will be reduced but not eliminated, although the integration and performance of generics collection should be superior to Java's collection libraries. Some new collections, like linked lists, will be introduced, but I haven't seen any indications of sets or other tree-based data structures from Java being introduced.

STL.NET, a .NET port of STL, will also be introduced by the VC++ teams, and indications are that the library will be usable by other managed languages; unfortunately, the style and naming conventions of that library are the same as that of native STL and will be incompatible with the framework's own style and naming guidelines.

All of these libraries, including STL, are missing an important type of collection--an indexable list with fast (logarithmic or better) insert and deletion times. These libraries simply offer a tradeoff--either the indexing time or the insertion/deletion time must take linear time. Many applications will simply stick with a simple array or ArrayList and accept the linear time performance. The result is the widespread phenomenon where a user will supply a large, but reasonable, amount of data into an application, and that application will soon slow to a crawl.

Many commercial applications will try to be a little more sophisticated by creating a list of lists, which still delivers linear time performance, but improves insertion time substantially by a constant factor. Such an approach still doesn't scale well after a certain point. For random access of elements in large sequential data, true industrial-strength applications will use a logarithmic-time list implementation, which is usually based on a tree or possibly a skip-list.

I wrote a general-purpose list, based on a balanced tree implementation. The list is broken up into chunks, each of which are of length 1 to some maximum specified chunk size. The chunks are all stored as nodes in the binary tree. In addition to the chunks, the nodes also contain a relative index value for keeping track of each node.

Insertion and deletion time are O( lg(n/chunkSize) ), which is practically instantaneous. Indexing time, a little slower, is also logarithmic O( lg(n/chunkSize) ). However, using an enumerator to sequentially access a range of elements via a tree walk yields constant time access per element on average. This leads to the interesting conclusion, that foreach is actually faster than indexing inside a for loop. Foreach takes O(m) on average to access m items, while for takes O( m lg n ) for the same set of items.

This implementation took me less than a week; in addition, I was able to add additional capabilities such as the ability to compactly store a run of elements, making this list also suitable for a run array or a sparse array. With trees, a lot of nice features become possible.

For the small amount of time it took me to write this implementation, I am surprised we don't have such list implementations in any library. It seems to me that a general-purpose scalable list class should be in all the standard collection libraries. It will probably be the case five years from now, but, for now, I am considering publishing my own implementation, as soon as I have converted it to use Whidbey generics and have worked with it for a few months to be sure all the bugs are out.

Easter Eggs

I have seen a number of posts, lamenting the death of Easter Eggs, which can pose a reliability and security risk.

In the course of developing Excel 95, several Excel developers from the charting feature team were caught up in the Doom craze, and each day they would be spend about two hours playing network Doom in company time. At some point, a few developers decided to build their own 3D engine after hours. After several weeks, they decided it was too hard and abandoned the effort; however, all was not lost, the engine survived and became an easter egg.

It was done with tacit management approval. (Oops, I am not supposed to say that! Don't tell anyone.) The reasoning I guess was: Better the Easter Egg you do know, than the Easter Egg you don't know. The Easter Egg also underwent scheduled testing to make sure it didn't affect Excel's operation. For example, the Easter egg will only trigger if no document has ever been opened or modified.

That Easter egg was a whopping 100K, back in the days when Windows 95's minimal configuration was 4MB and magazines regularly complained about bloated applications. You don't hear about bloated applications anymore, since hardware has long since far outstripped the needs of software, and Microsoft perfected the art of performance testing.

In Excel 97, the 3D Maze Easter egg was replaced with a flight simulator. See if you can find my name in there.

You can find out the keystrokes for both Excel 95 and 97's Easter Eggs, by googling "Excel Easter Eggs." Purely from memory, I believe the steps are as follows: Start Excel. Press F4. Enter X97:L97. Press Tab. Press Ctrl+Shift+ChartWizard button on the toolbar.

Before I left Microsoft, I was seriously tempted to leave my mark by planting my own Easter Egg in Excel, but I ultimately decided against it.

February 23, 2004

Self-Censorship

While I try to be very open in my blog, I find myself becoming less and less frank about Microsoft. Already, I killed a few posts, particularly those which deal with Microsoft Office, the product that I spent six years developing with.

I don't want to say anything potentially embarrassing to Microsoft, because I still want to preserve any possibilities for some future relationship with Microsoft. Also, I only develop for the Windows platform, and I plan to take advantage of many of the new ISV opportunities and partnerships available.

I also don't want to be in the position where Slashdot quotes my blog and uses my words as an ex-Office developer to encourage the world to move to Linux and StarOffice. It was a privilege to work for Office and I rather not abuse it. Perhaps, my words could potentially shave off a few million (an infinitesimally small amount from Microsoft's perspective) off of Office's revenues, so I tend to just avoid discussing Office, unless my words could not reasonably be interpreted negatively.

February 20, 2004

GC Myth is Real

Paul Wilson points out a Garbage College myth, "Set Object to Null," that he tries to dispel, but he is only half right. The myth is:

There is never any reason to set your objects to null since the garbage collection will do this for you automatically.

He asserts that there are times where setting objects to null can make huge differences in the memory footprint of the application. This is true; there are times. However, he also implies that references, which are not nulled out, will only be collected after the reference goes out of scope. This is not true.

The garbage collector does, in fact, track the last known use of a variable within a method, and will upon collection, ignore those variables, whose instruction pointer exceeds the address of the last known use of those variable.If a variable is still used further down a function, then setting object to null does help.

Let me point out System.GC.KeepAlive(obj), a function call that is used to prevent objects from being freed prematurely. In the following example from Rotor, the GC can prematurely close the stream before MethodThatSpansGC() finishes. This is because the Foo object is no longer used in Main and the this pointer is no longer used when stream.MethodThatSpansGC is called.

class Foo { 
   Stream stream = ...; 
   protected void Finalize() { stream.Close(); } 
   void Problem() { stream.MethodThatSpansGCs(); } 
   static void Main() { new Foo().Problem(); } 
}

Adding GC.KeepAlive at the end of Problem() prevents the stream from being prematurely closed.

Colors Undocumented

I have been looking at how colors are implemented in GDI+.

System.Drawing.Color is a value type, but it is not the lightweight data structure that you might expect as in Win32. In Win32, colors were represented as 32-bit integers. The framework's Colors, in contrast, occupy 16 bytes or 128 bits. Despite the size, each color component--red, green, blue, alpha--still occupies a mere 8 bits. Apparently, four bytes are unused and the rest of the other bytes are used to point to the color's name or to identify the color is being one of standard set of known or system colors. Representing each color by its 32 bit integer representation from ToArgb() leads to space savings inside an array or structure.

Some interesting observations about color.

Color comparison does not behave as one would expect. Known colors are distinct from regularly generated colors, so Color.Red does not equal Color.FromArgb(255,0,0), which is also red. Proper color comparison requires ToArgb() be called on each color object before testing.

Colors can have names, but the name support isn't that great. It's not possible to convert from an arbitrary color to its known color. Color.Red.Name returns "Red", but Color.FromArgb(255,0,0).Name returns "ffff0000". Interestingly, Color.FromName("ffff0000") won't be recognized, even though it was returned by the Name property. Instead, as with all unknown names, it will yield a transparent color with a pointer to the same unrecognized name. The only way to determine if FromName() successfully recognized the name is to check the IsKnownColor property or call ToKnownColor() on the color returned. Better string conversion support is provided with the ColorConverter class.

In addition to the Color class, SystemColors also defines static readonly colors for various system colors like ControlText and so on. These colors are tracked automatically as system colors are changed dynamically through the control panel. The IDs for both known colors defined in the Color class and system colors defined in SystemColor are stored in the KnownColor enumeration class. Both known colors and system colors can be generated by the Color(KnownColor) constructor.

Color.Transparent is the only color defined with an alpha value set to less than the maximum; not surprisingly, its alpha is set to zero. The base color of Transparent is white not black, which can affect any gradient brush that uses that color. (For the uninitiated, the combination of alpha blending and gradient filling is the standard way of applying advanced 3D lighting effects to 2D graphics like gel buttons.)

The System.Color class is very minimal, except for the predefined colors. There are few methods, and those that do exist seem somewhat incomplete. There is a way to extract the hue, saturation, and brightness of a color, but no way to construct a color from those same coordinates in the HSB colorspace.

Avalon Colors
Based on the information on WinFX247.com, colors, represented by System.Windows.Media.Color, are going to be even heavier objects in Avalon/Longhorn, consuming at least 20 bytes, with color components represented as higher-fidelity single-precision floating point numbers. It will also includes support for scRgb, a more-accurate color system. Avalon also appears to support more operations like color arithmetic.

Empty Arrays

Microsoft guidelines recommend developers to avoid using nulls for returning arrays or strings inside managed code. Instead, the empty string should be returned for strings, and an empty array should be returned for arrays.

Benefits
There are several benefits to following the guidelines.

  1. Reliability. The avoidance of nulls eliminates the possibility of a NullReferenceException being raised, potentially bringing an application down to a halt. Usually, most code will continue to work correctly if presented with an empty array or string, since a length of zero follows a continuous programming model for dealing with exceptional cases; however, null references must always be explicitly handled.
  2. Simplicity and conciseness. The use of two different values, a null value and an empty value, for exceptional results often requires that multiple checks be performed and that code be written to handle both cases similarly. This may also result in inconsistent return values from similar functions or even the same function over time. If one of the two checks is missed, hopefully not the wrong one, then it may be possible for the two return values to follow different code paths, which can be problematic for testing.
  3. Consistency. The two different values can also lead to confusion as to which value is returned when a method returns an array. For example, the .NET framework has Type.GetProperties() that returns an array of PropertyInfo[], but the documentation does not indicate what value is returned in the event of no properties. Standardizing on return values of empty arrays, as the framework has done, allows the developer to accurately predict the result of a method call in the absence of adequate documentation, especially when cases where a method would return an exceptional result are rare (thus preventing the experimental determination of the exceptional return value).

Reducing the Memory Hit
Standardizing on empty arrays and strings offers more reliable and concise code. However, the tradeoff is introduction of additional memory allocations and the greater frequency of garbage collection. The tradeoff is actually minimal, since temporary allocations are almost free.

For strings, memory allocation is easily avoided by calling String.Empty or even by using the literal empty string "", which is guaranteed to be interned. For empty arrays, the solution to the memory allocation issue is surprisingly simple: Empty arrays are not modifiable, since they have no elements. As a result, they can be shared freely! A static variable can thus be declared, pointing to an empty array of a given type, such as the following:

public static object [] EmptyArray = new object[0];
public static string [] EmptyStringArray = new string[0];

Note that an empty array still needs to be declared for each type. Through the magic of covariance, you can get away with returning an array of a derived type for a function that is declared to return an array of a base type. For example, EmptyStringArray, an empty string array can be returned wherever an object array is expected.

One last point: The Whidbey string class does support a new method IsNullOrEmpty which performs a check for both the null and empty string, so while Microsoft recommends strictly relying on the empty string, it is being pragmatic by providing support for a common but deprecated usage pattern.