Whidbey Hashtables
The new generic hashtable collection in Whidbey, Dictionary<K,V>, has a number of major differences from v1.1. There are also changes to the existing non-generic hashtable collection.
Chaining versus Probing
Dictionary uses chaining instead of probing to insert items into the hashtable. In chaining, every bucket points to a linked list and no other bucket is search on a collision; in probing, if a collision occurs in one bucket, another bucket is search in the bucket table by generating a second hash value and using it as a offset value for examining other buckets in the table. A Microsoft blogger reported several months that hashtable performance doubled by using the new algorithm. There are a few consequences of this behaviors as I'll mention in the next few sections.
Memory
Dictionary uses an array implementation of a linked list (ie, the next variable is an integer index to another node in the array), which keeps memory usage down. The Dictionary implementation uses less buckets than hashtable (since probing requires extra buckets allocated); however, the advantage of using fewer buckets is offset by the additional 8 bytes allocated per bucket to store linked list information.
The memory used by both Hashtable and Dictionary are comparable. When storing both keys and value as reference types, Dictionary appears to be a less than 20% larger than the non-generic Hashtable. However, when either one or both key and values are value types, Dictionary uses substantially less memory than Hashtables, up to 30% less when one of key or value are value type and up to 50% when both are value types.
Ordered Keys in Dictionary
In contrast to Hashtables, Dictionary maintains the keys in the order that in which they were originally inserted (except if items have been deleted in which case a bucket may be freed in the middle of the list and available to a new item).
Empty Dictionary Optimization
Following the generic List type, the generic Dictionary class doesn't allocate much memory when empty. This keeps collections relatively lightweight (although still consuming 40 bytes, not including the object header) when they are included in objects as property or event bags, when most of the time there are no properties or events set. This reduces the need for something like HybridDictionary, which wasn't that lightweight anyway.
Fast Enumerators
Also following the generic List type, the Dictionary enumerators are struct-based and thus allocation-free, which helps to reduce memory pressure in a GC world.
Load Factor
Loadfactor is no longer available in Dictionary, as it makes little sense with the new chaining scheme. You can still reduce collisions by specifying an initial higher capacity for the Dictionary, if you know the size of the table in advanced.
http://www.zen13120.zen.co.uk/Blog/2004/08/blog-link-of-week-33-as-in-week-ending.html
Posted by: Daniel Moth | August 25, 2004 at 11:29 AM
It is quite strange that the class Dictionary K,V doesn't have any ctor that takes smthing like a IHashCodeProvider K. Instead, some of its ctor take some IComparer K, that makes me think that it relies internaly on a 'sorted like' algo. Could you clarify this point?
Posted by: Patrick Smacchia | November 12, 2004 at 01:33 AM
Hashtables use to take both IHashCodeProvider and IComparer. HashCodeProvider has limited utility and also isn't generic, so it introduces boxing costs.
IComparer merges the functions of HashCodeProvider with IComparer, so it supports GetHashCode(o), CompareTo(a,b), and Equals(a,b). The benefit is that a hashtable only needs to store one value rather than two values internally.
Posted by: Wesner Moise | November 12, 2004 at 06:41 AM