Kristofer Goss wrote me yesterday, wondering what my thoughts on the performance tradeoffs between .NET and native code are.
I would love to hear your thoughts on the performance and runtime overhead of Windows Forms, specifically with regards to some comments Nick Bradbury had on why he chose Delphi for implementing FeedDemon, posted here:
http://nick.typepad.com/blog/2004/06/how_microsoft_l.html
I'm considering writing a Windows app to run on lower end machines where high end processors and lots of RAM are not the norm. One concern I have is the performance and memory usage of WinForms clients vs. Win32 applications built with something like Delphi. Firing up FeedDemon and other tools like SharpReader and RSSBandit, the contrasts are pretty striking in terms of memory usage (on the same blog roll.)
Although I prefer C# much more since I've been working with it for a while now, I'm really trying to weigh out what is best for my potential end-users.
I'm considering a Delphi/Win32 client side EXE at the moment. I'd appreciate hearing your comments on this since you're leveraging Windows Forms for your future product. Perhaps you feel this is worth blogging about.
Here's what I believe:
I don't think that .NET applications necessarily perform worse than native applications. On the other hand, even in idle, WinForms applications are clearly doing a lot more work than their native cousins, constructing numerous kinds of event objects for mouse movement and idle events. These idle costs are still fairly small, as temporary objects are essentially free in .NET.
Performance is primarily affected by the algorithms and data structures used by the programs. Object allocations do not figure in as much as the overall design of the application. Heap-based objects are definitely slower stack-based objects, but not by much in the .NET world. Some at Microsoft actually believe that performance potentially can be faster in .NET applications, because dynamically compiled code can offers computer-specific optimizations and eliminates indirections to addresses known only at runtime, and GC-based heap allocations can approach the performance of stack allocations. The GC does perform poorly when objects experience mid-life crisis or when very large temporary objects (>85K) are created.
The perception that managed applications are slow may be due to self-selection bias. That is, the programmers, most sensitive to performance and most adept at writing performant code, are also the least likeliest to migrate to managed code. The end result is that managed applications tend to be written by less performance-savvy programmers, who are more interested in the managed environment for other reasons like enhanced productivity.
That said, managed DirectX is 5% slower than the native API. Not to say that the performance could not have been improved by a less clean, non-object-oriented port, but managed DX introduces unavoidable WinForms overhead mentioned above as well as the managed-to-native transition costs.
I use SharpReader regularly and am aware of its performance issues. When it performed poorly a few times, I examined it under the Performance Monitor microscope and discovered that garbage collector was hardly operating at all.
When I imported a large OPML file, (normally a lengthy operation, so it doesn't qualify as a performance problem) I did noticed that SharpReader was allocating over 30 million bytes per second, yet only spending 3% of its time in the garbage collector--a good demonstration of how efficient the GC is at reclaiming temporary objects.
Some of the real performance issues can be attributed to the simple fact that SharpReader runs in DEBUG mode. Also, FurryGoat had a post (since removed), in which he looked at SharpReader through the CLRProfiler and determined that XML serialization was probably a major cause.
As for SharpReader's large memory consumption, I discovered using the CLR Profiler that the primary culprit is the large number of strings allocated (28MB for 34401 strings--72% of all memory allocated) to store feed text. Most of the string are in objects of type Model.RSSItem. Luke (the author of SharpReader) could instantly reduce memory consumption in half by storing the strings by simply using byte arrays encoded in UTF8 using the System.Text.Encoding APIs. This is not a native versus .NET issue: If FeedDemon were storing feed text in the same manner, it would have have the same memory hit; more than likely, FeedDemon isn't using Unicode strings, for one thing.
I haven't previously noticed any performance issues with RSS Bandit; on the other hand, I haven't used RSS Bandit on a regular basis. A quick Google search reveals some gripes with performance, which either still persists and I haven't seen it, or have been fixed with regular updates. Just simple fixes can remove performance bottlenecks. It seems after each new version of SharpReader, Luke discovers another fix that improves performance by 25%.
The working set for SharpReader is 30Mb, FeedDemon is 23 Mb, and RSS Bandit is 4 Mb in their initial configuration on my machine. (In comparison, the working set for MS Word and MS Excel are about 18 Mbs.) So, actually in their bare configuration, RSS Bandit is the tightest of them all, even considering that RSS Bandit also uses the .NET runtime. However, the working set of .NET applications have a significantly higher variance than native applications. While RSS Bandit was idle, I watch the working set figures initially progress to 13 MBs, then in an instant fall down to 6.5MB, as it appears a collection has occurred. The working set oscillated in an ever narrowing range (down to a range between <3Mb to 6Mb) that apparently reflected dynamic tuning by CLR. Native applications, in contrast, normally have zero variance in working set during idle.
The contrast between SharpReader and FeedDemon is more a reflection of the difference between a free application written as a hobby and a professionally written commercial application, and less as a indicator of Delphi's inherent performance advantage over C#. Performance issues with NewsGator, an Outlook-based reader, which I believed is managed, are likely due to the very high overhead and poor performance of OLE automation in general.
The link doesn't seem to work :(
Interesting insights though ..
Posted by: | September 08, 2004 at 01:38 AM
Link should be http://nick.typepad.com/blog/2004/06/how_microsoft_l.html (case sensitive for how_[mM]icrosoft_l.html :)
Posted by: | September 08, 2004 at 01:43 AM
Hi,
How do you get that 28MB value? I've run CLR Profiler (2.0) myself and it only shows that SharpReader uses about 1MB of strings.
I wonder what settings do you use in CLR Profiler to get this figure?
Victor
Posted by: Victor | September 08, 2004 at 03:09 AM
You won't be able to replicate my experiment exactly. The amount of memory used depends on your subscriptions, unread and locked posts, etc. My SharpReader sometimes has a working set of 200MBs, but you can bet that a similarly large percentage of that are strings.
Try examining SharpReader, when it's working set is high, and you will see a large proportion of that are in-memory strings. Now, if that's not the case with FeedDemon, the content may either be compressed or dynamically retrieved from disk.
Posted by: Wesner Moise | September 08, 2004 at 03:25 AM
Hi Wes, great post and exactly why I read your blog so avidly in the first place! Thanks for the valuable feedback, you raise some excellent points that warrant careful consideration before dismissing WinForms out of performance concerns. Careful goals, metrics, and analysis are always part of the process in addressing these issues.
-Kris
Posted by: Kris | September 08, 2004 at 04:19 AM
Hi Wes, great post and exactly why I read your blog so avidly in the first place! Thanks for the valuable feedback, you raise some excellent points that warrant careful consideration before dismissing WinForms out of performance concerns. Careful goals, metrics, and analysis are always part of the process in addressing these issues.
-Kris
Posted by: kris | September 08, 2004 at 04:20 AM
When RssBandit first loads with the default set of feeds, it may be at that. But on my machine, I get 240M with somewhere around 150 feeds...
I agree that this could be handled better by the developers, but I didn't want anyone to think that RssBandit is actually that light!
I use RssBandit because it is the best aggregator that I have found. I only have 2 items on my wish list for it: 1. Better Speed. 2. Less Memory.
Posted by: Jeff Lewis | September 08, 2004 at 10:47 AM
You know, you've touched on a few things that I've been curious about.
For example, email messages parsed into a rich object model... does one take each of the headers, split them into X number of strings and present them as Header.Value properties, OR in my opinion, a better solution is to just leave the header steam string the same, but have the Header object remember the index and length, and the Value property returns the string.Substring(index,length), and giving you a very temporary string.
You mentioned storing it into a UTF8 byte array and I assume call UTF8.GetString() to return the item strings, and I really wonder if thats better than just storing the feed (smartly either as one string per feed item or the entire feed, depending on the actual size) and calling Substring on the string from the feed...
Of course, most of them are using the XmlSerializer, which incidentally seems to use a hybrid approach of storing one string per node, and calling the Value property Substrings that string... [I wonder if I'm explaining enough to follow]
Another thing, this DELPHI solution, is he actually parsing the Xml into an XMLDOM? or just doing a parsing routine that garnered some ridicule on DailyWTF.com once ;-)
http://TheDailyWTF.com/archive/2004/09/13/1739.aspx
Posted by: Eric Newton | September 28, 2004 at 01:31 PM