Algorithmic Complexity
In PDC session, “Five Things Every Win32 Developer Should Know,” Raymond Chen makes a dangerous comment in his session abstract:
The top two factors that control performance are I/O and memory. After exploring the relationship between physical memory, virtual memory, and the virtual address space with popular Win32 blogger and Microsoft developer Raymond Chen, we mix in some knowledge of how CPUs work and see how an O(n) algorithm can outperform an O(log n) algorithm: Why much of what you learned in school simply doesn't matter.
I understand that his comment is meant to be provocative and intended to drum up his interest in his talk. He wants to stress the importance of being aware of multi-level caching hierarchies as well as the notion that “memory is speed.” The more memory used, the slower the software runs because caches are cleaned out more frequently. Some background information on the memory hierarchy is in ars technica.
While binary trees can deliver logarithmic-time performance over vector representations of data, there are some costs:
-
Poor locality. Trees nodes are connected via links, which defeat the caching benefits of locality and also leads to more frequent and costly paging operations.
-
Overhead. Tree nodes typically introduce additional memory overhead per item for bookkeeping such as pointers to parent and children nodes, CLR object header information, and tree balancing information. Arrays, in constrast, typically have no overhead on top of data.
He ignores the possibility of hybrid data structures. Tree nodes, for example, can contain multiple items to increase the caching benefits of locality, thereby reducing the overall constant factor of algorithms applied to the tree. This also reduces per object overhead. Because of hybridization, my own tree-based list uses a comparable amount of overhead to a simple arraylist with average overhead less than 50% per element.
Also, The .NET garbage collector mitigates the impact of locality. Nodes created together are adjacent in memory. Over multiple collections, the locality benefits increases because temporary objects allocations between the two nodes are freed and the two are compacted together.
With these disadvantages out of the way, the logarithmic data structures also offers these advantages:
-
Scalability to large number of elements. Trees touch only a fraction of the data contained.
-
Better composability. Operations on linear-time data structures can quickly explode to quadratic performance.
Trees are also more flexible data-structure, supporting inheritance and composition, and allowing for a lot of new interesting inexpensive features. I have been able to use my tree-based list and set data structures universally without resorting to special-purpose data-structures; that would not have been the case with vector-based lists.
Update: Coetzee has a related post on unrolled linked list.
Hi Wesner. Thanks for the link. I agree with you - the comments made by Chen are misleading at best. It's certainly true that linear search on an array will easily outperform walking a binary tree up into a few thousand elements, but it simply doesn't scale up to very large lists. It's important to consider the effect of cache, but also important to achieve good asymptotic bounds on the number of cache misses. One of the most efficient map data structures I know of is the Judy array (see http://judy.sourceforge.net/), which is based on 256-tries and is tuned for good cache behaviour.
Posted by: Derrick Coetzee | August 23, 2005 at 02:35 PM
What Chen said is that an O(n) algorithm CAN
outperform an O(logn) algorithm considering cache and memory hierarchy.
I can not see anything you said that contradict that.
Posted by: | October 06, 2005 at 06:59 AM
My post predated his lecture, but his initial descriptions of his sessions were even more incriminating.
Posted by: Wesner Moise | October 06, 2005 at 07:03 AM