« Religious Wars | Main | Generics in Avalon »

January 27, 2004

Are Objects Cheap?

I still don't have a handle on what objects cost. One .NET author, Jan Gray, calculated that the GC collects short-lived objects at a rate of 50 million objects per second. I am suspicious of that number; I think that rate can only be maintained with a small GC 0 heap, which the runtime sizes to fit into the cache. In another benchmark, another author calculated in high-allocation scenarios, that C# code often outperforms unmanaged C++; allocating an object in managed code is almost as fast as a stack allocation, involving in most cases a simple pointer increment.

I use SharpReader, an RSS reader that is written in managed code and consumes a large amount of memory--almost 200 MBs on a typical session. It's usually fairly responsive. Occasionally, after being active for several hours, the application becomes sluggish. I have wondered whether this was due to the garbage collector, or whether the cause is related to threading or network issues. The Performance Monitor (perfmon.exe) does not indicate any unusual GC activity.

Still, I am wary of using objects when a struct will do. The NLP group at Microsoft shares the same concern. When porting their natural language libraries from unmanaged code to managed code, they have deliberately distorted their API to favor structs and minimize the number of allocations under the view that avoiding the garbage collection will always be faster. For example, the TextFragment class encapsulates a stream of text that will undergo natural language analysis. The class has a Sentences property that returns a collection. When enumerating the collection through foreach, the collection reuses the same object and returns that same object after each call to IEnumerator.MoveNext() and IEnumerator.Current.

However, the Avalon team makes extremely liberal use of objects everywhere as if objects are free. Some developers even have the philosophy that structs have their own set of problems and should almost never be used. Then again, Avalon did rewrite the codebase to improve performance, but I doubt it had much to with the number of objects created.

I have started to rely more on reference types in place of value types, based on the Avalon example. My new thinking is that reference types are easier to work with. Later, I can look at the performance impact of using that reference type, and revert to a struct (or an interface) if need be.

To obtain the best of both worlds, I thought about writing a Perl script that takes a struct and produces a class or interface wrapper, which is more useful for referring to the value type in the heap. Of course, value types can be boxed too, but then the contexts of the object must be copied back to a value type for access to member functions.

Comments

Even in Smalltalk implementations, where they've had 20+ years to optimize object creation and garbage collection, objects are not "free" but then neither are "structs". Copying references to objects is cheaper than copying the data of a struct (assuming the struct is larger than a 32-bit pointer).

However, it is premature optimization to worry about object creation and deletion without profiling to tell you where the slow spots are. Designing "value" objects that are immutable and "entity" objects that are not duplicated does a lot to insure good performance. (See the book Domain Driven Design by Eric Evans.) In the few situations where you migth have to create and manipulate thousands of objects per second, then you may need to do something with arrays of structs instead of lists of objects.

See http://c2.com/cgi/wiki?FirstRuleOfOptimization and http://c2.com/cgi/wiki?OptimizeLater

NeXT dealt with exactly this issue (modulo garbage collection) 15 years ago in NEXTSTEP and 10 years ago in OpenStep. And now Apple's dealing with it again as they enhance Cocoa, a derivative of OpenStep.

Their frameworks have gotten much more class-centric over the years. In 1989 (25MHz 030), the overhead of objects versus structs and raw C strings was pretty high, so NEXTSTEP didn't even have a string class. In 1994 (66+MHz 486), the overhead was much less, so aside from points, rectangles, and ranges, most everything in an OpenStep application is an object including strings. And in Cocoa, the only major new struct I can think of is a transformation matrix (which was handled in Display PostScript by OpenStep).

Of course, Microsoft will have to rediscover all of this because they don't seem to be too observant of the happenings on other platforms.

The comments to this entry are closed.