About

I am a software developer in Seattle, building a new AI software company.

Ads

April 2009

Sun Mon Tue Wed Thu Fri Sat
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30    

Categories

Ads


« Static Performance | Main | Waiting for Longhorn »

August 13, 2004

Whidbey Hashtables

The new generic hashtable collection in Whidbey, Dictionary<K,V>, has a number of major differences from v1.1. There are also changes to the existing non-generic hashtable collection.

Chaining versus Probing

Dictionary uses chaining instead of probing to insert items into the hashtable. In chaining, every bucket points to a linked list and no other bucket is search on a collision; in probing, if a collision occurs in one bucket, another bucket is search in the bucket table by generating a second hash value and using it as a offset value for examining other buckets in the table. A Microsoft blogger reported several months that hashtable performance doubled by using the new algorithm. There are a few consequences of this behaviors as I'll mention in the next few sections.

Memory

Dictionary uses an array implementation of a linked list (ie, the next variable is an integer index to another node in the array), which keeps memory usage down. The Dictionary implementation uses less buckets than hashtable (since probing requires extra buckets allocated); however, the advantage of using fewer buckets is offset by the additional 8 bytes allocated per bucket to store linked list information.

The memory used by both Hashtable and Dictionary are comparable. When storing both keys and value as reference types, Dictionary appears to be a less than 20% larger than the non-generic Hashtable. However, when either one or both key and values are value types, Dictionary uses substantially less memory than Hashtables, up to 30% less when one of key or value are value type and up to 50% when both are value types.

Ordered Keys in Dictionary

In contrast to Hashtables, Dictionary maintains the keys in the order that in which they were originally inserted (except if items have been deleted in which case a bucket may be freed in the middle of the list and available to a new item).

Empty Dictionary Optimization

Following the generic List type, the generic Dictionary class doesn't allocate much memory when empty. This keeps collections relatively lightweight (although still consuming 40 bytes, not including the object header) when they are included in objects as property or event bags, when most of the time there are no properties or events set. This reduces the need for something like HybridDictionary, which wasn't that lightweight anyway.

Fast Enumerators

Also following the generic List type, the Dictionary enumerators are struct-based and thus allocation-free, which helps to reduce memory pressure in a GC world.

Load Factor

Loadfactor is no longer available in Dictionary, as it makes little sense with the new chaining scheme. You can still reduce collisions by specifying an initial higher capacity for the Dictionary, if you know the size of the table in advanced. 

 

Comments

It is quite strange that the class Dictionary K,V doesn't have any ctor that takes smthing like a IHashCodeProvider K. Instead, some of its ctor take some IComparer K, that makes me think that it relies internaly on a 'sorted like' algo. Could you clarify this point?

Hashtables use to take both IHashCodeProvider and IComparer. HashCodeProvider has limited utility and also isn't generic, so it introduces boxing costs.

IComparer merges the functions of HashCodeProvider with IComparer, so it supports GetHashCode(o), CompareTo(a,b), and Equals(a,b). The benefit is that a hashtable only needs to store one value rather than two values internally.

The comments to this entry are closed.