Kristofer Goss wrote me yesterday, wondering what my thoughts on the performance tradeoffs between .NET and native code are.
I would love to hear your thoughts on the performance and runtime overhead of Windows Forms, specifically with regards to some comments Nick Bradbury had on why he chose Delphi for implementing FeedDemon, posted here:
I'm considering writing a Windows app to run on lower end machines where high end processors and lots of RAM are not the norm. One concern I have is the performance and memory usage of WinForms clients vs. Win32 applications built with something like Delphi. Firing up FeedDemon and other tools like SharpReader and RSSBandit, the contrasts are pretty striking in terms of memory usage (on the same blog roll.)
Although I prefer C# much more since I've been working with it for a while now, I'm really trying to weigh out what is best for my potential end-users.
I'm considering a Delphi/Win32 client side EXE at the moment. I'd appreciate hearing your comments on this since you're leveraging Windows Forms for your future product. Perhaps you feel this is worth blogging about.
Here's what I believe:
I don't think that .NET applications necessarily perform worse than native applications. On the other hand, even in idle, WinForms applications are clearly doing a lot more work than their native cousins, constructing numerous kinds of event objects for mouse movement and idle events. These idle costs are still fairly small, as temporary objects are essentially free in .NET.
Performance is primarily affected by the algorithms and data structures used by the programs. Object allocations do not figure in as much as the overall design of the application. Heap-based objects are definitely slower stack-based objects, but not by much in the .NET world. Some at Microsoft actually believe that performance potentially can be faster in .NET applications, because dynamically compiled code can offers computer-specific optimizations and eliminates indirections to addresses known only at runtime, and GC-based heap allocations can approach the performance of stack allocations. The GC does perform poorly when objects experience mid-life crisis or when very large temporary objects (>85K) are created.
The perception that managed applications are slow may be due to self-selection bias. That is, the programmers, most sensitive to performance and most adept at writing performant code, are also the least likeliest to migrate to managed code. The end result is that managed applications tend to be written by less performance-savvy programmers, who are more interested in the managed environment for other reasons like enhanced productivity.
That said, managed DirectX is 5% slower than the native API. Not to say that the performance could not have been improved by a less clean, non-object-oriented port, but managed DX introduces unavoidable WinForms overhead mentioned above as well as the managed-to-native transition costs.
I use SharpReader regularly and am aware of its performance issues. When it performed poorly a few times, I examined it under the Performance Monitor microscope and discovered that garbage collector was hardly operating at all.
When I imported a large OPML file, (normally a lengthy operation, so it doesn't qualify as a performance problem) I did noticed that SharpReader was allocating over 30 million bytes per second, yet only spending 3% of its time in the garbage collector--a good demonstration of how efficient the GC is at reclaiming temporary objects.
Some of the real performance issues can be attributed to the simple fact that SharpReader runs in DEBUG mode. Also, FurryGoat had a post (since removed), in which he looked at SharpReader through the CLRProfiler and determined that XML serialization was probably a major cause.
As for SharpReader's large memory consumption, I discovered using the CLR Profiler that the primary culprit is the large number of strings allocated (28MB for 34401 strings--72% of all memory allocated) to store feed text. Most of the string are in objects of type Model.RSSItem. Luke (the author of SharpReader) could instantly reduce memory consumption in half by storing the strings by simply using byte arrays encoded in UTF8 using the System.Text.Encoding APIs. This is not a native versus .NET issue: If FeedDemon were storing feed text in the same manner, it would have have the same memory hit; more than likely, FeedDemon isn't using Unicode strings, for one thing.
I haven't previously noticed any performance issues with RSS Bandit; on the other hand, I haven't used RSS Bandit on a regular basis. A quick Google search reveals some gripes with performance, which either still persists and I haven't seen it, or have been fixed with regular updates. Just simple fixes can remove performance bottlenecks. It seems after each new version of SharpReader, Luke discovers another fix that improves performance by 25%.
The working set for SharpReader is 30Mb, FeedDemon is 23 Mb, and RSS Bandit is 4 Mb in their initial configuration on my machine. (In comparison, the working set for MS Word and MS Excel are about 18 Mbs.) So, actually in their bare configuration, RSS Bandit is the tightest of them all, even considering that RSS Bandit also uses the .NET runtime. However, the working set of .NET applications have a significantly higher variance than native applications. While RSS Bandit was idle, I watch the working set figures initially progress to 13 MBs, then in an instant fall down to 6.5MB, as it appears a collection has occurred. The working set oscillated in an ever narrowing range (down to a range between <3Mb to 6Mb) that apparently reflected dynamic tuning by CLR. Native applications, in contrast, normally have zero variance in working set during idle.
The contrast between SharpReader and FeedDemon is more a reflection of the difference between a free application written as a hobby and a professionally written commercial application, and less as a indicator of Delphi's inherent performance advantage over C#. Performance issues with NewsGator, an Outlook-based reader, which I believed is managed, are likely due to the very high overhead and poor performance of OLE automation in general.