Dictionary Collections
The Base Class Library team silently introduced a new tree-based collection called SortedDictionary<K,V> into the framework. This collection is based on Red-Black trees, with guaranteed O(log n) operations for nearly all operations.
For non-ordered dictionaries, Dictionary<K,V>, based on a hash table, is still a better bet, because most operations average constant time. In addition, it has much better locality and less overhead. However, constant-time performance is not guaranteed; although unlikely, worst case time is still linear for objects with poorly distributed hash functions. This is especially true when hashing heterogeneous objects, since most objects have a default instance that returns a hash value of zero and, somewhat less often, these objects may return consecutive integer values starting from zero or one.
For ordered collections, Dictionary<K,V> may also be a better bet if insertion order and sort order are the same. Last time I checked during Beta 1, insertion ordering was preserved, provided no values were previously removed.
For nicely distributed hash values, I use the following function below adapted from Cormen’s famous book, Introduction to Algorithms, published by MIT Press.
public static int CreateHashCode(int value, long range)
{
const long GoldenRatioBits = 2654435769;
Debug.Assert(range>0 && range <= 1L << 32);
uint hash = (uint)(value * GoldenRatioBits);
return (int)((hash * range) >> 32);
}
public static int CreateHashCode(int value)
{
return CreateHashCode(value, 1L << 31);
}
The idea here is to take a pre-existing value and multiply it by the golden ratio, (1+sqrt(5))/2, then multiply the resulting fraction by the desired range of the new hash value--in this case, the range of positive 32-bit integer values. Other floating point values may do, but the golden ratio has some optimal properties. (In this case, GoldenRatioBits is the fractional part of the golden ratio, multiplied by 2^32. Using floating-point multiplication might have made the code more understandable but slightly slower.)
Since your multiplier is less than 2^32, you should probably shift down by 31 rather than 32.
heres why:
any 32-bit number, multiplied by 2^32 * (2^32/1.6....) is going to result in a 64 bit number less than 2^64 * (1/1.6.....)
shifting down by 32, your hash values will all be less than 2^32 * (1/1.6....)
shifting down by 31 and truncating to 32 bits your hash values will fill out the full range of 32 bits.
Posted by: damien morton | September 25, 2005 at 02:26 AM
I am aware of that issue...
Simply taking
long hash = (uint)(value * GoldenRatioBits)
returns a random 32-bit integer. This is effectively the same as range of 2^32L. This produces a better solution than performing any additional shift operations.
CreateHashCode returns a value from 0 to range-1. So, my example above generates a number from 0 to int.MaxValue.
If I have a hash table of size n and decide to improve the hash value returned from GetHashCode(), I would set the range to n to get a value from 0 to n-1.
The high-order bits are more "distributed" or "random" than the lower order bits. A mod operation (by a power of 2) would tend to ignore the more valuable high order bits, which is why I perform a multiplication at the end. (A multiplication by a power of 2 is simply a bit shift, that gives me just the high-order bits).
Posted by: Wesner Moise | September 25, 2005 at 02:52 AM
Oh I see what youre doing with your multiply-and-shift. My appologies... I think I was a little drunk when I posted that last message.
You might appreciate a hash function I have been working on - its fairly good at ensuring that any bits are random, not just the top or bottom bits.
static int MarsagliaHash(T[] array)
{
const ushort mz = 30963;
const ushort mw = 18273;
uint w = 0, z = 0;
foreach (var item in array)
{
uint h = (uint)item.GetHashCode();
z = unchecked(mz * (((ushort)z) + (z >> 16) + ((ushort)h)));
w = unchecked(mw * (((ushort)w) + (w >> 16) + (h>>16)));
}
return (int)((w << 16) | ((ushort)z));
}
Posted by: damien morton | September 25, 2005 at 07:32 AM