« Mobile Innovations | Main | Looking Forward »

March 14, 2013


TrackBack URL for this entry:

Listed below are links to weblogs that reference Immutable Collections Critique:


Andrew Arnott

Thanks for sharing. I'm curious whether your statement about the lack of fast set operations is referring to how the currently implementation implements the set comparison methods (IsSupersetOf, etc.). Can you comment?

Also, I hope that when you do release your version, that you can implement the interfaces we define in this one so that it maximizes interoperability across implementations.

Wesner Moise

I'll post more about this, but yes I am referring to operations such

Union, Intersect, ExclusiveUnion, Difference, IsSuperset with another ImmutableSet.

These are currently O(n) when performing the operations on two ImmutableSets. This can actually be a significant cheaper operation, given the representation.

Andrew Arnott

I don't think that you can achieve "significantly cheaper" than O(n) time in general for the set operations you mentioned. It takes at least that time to even enumerate a set, so how could you possibly establish whether one is a superset of another *in general* without enumerating it?

Now, if the sets were sorted, I can imagine some optimizations that could be very effective in many cases, but even then, I *think* the general case is still O(n) at best. Am I wrong?

Wesner Moise

About the cheaper set operations, the representation matters.

See Fast Set Operations on Treaps. http://www.cs.cmu.edu/~scandal/papers/treaps-spaa98.pdf

With an ienumerable, it will always be Omicron(n). But if you perform set operations on treaps or some other tree-like datastructure, the set operation be written recursively, so that if two immutable nodes from both sets compare equal, then the operation can be aborted early. You don't need to look at all the nodes; only the ones that are different.

The set representation that I use, treaps, produces a unique stable tree structure for the set of values it holds. With hash consing of nodes and equality canonicalization (ie, if 2 separate nodes are equal, change the reference to the younger node to point to the older node -- similar to union-find algorithm, which is effectively constant time), equality tests can effectively be made constant time on average.

Wesner Moise

Pardon my sloppiness - O(n log(m)) not O(n) from my first comment.

Also, the general case, actually does involve looking at every node... but if you the two trees were related to each other in some way, for example modified from a common older version of the tree. Some of the nodes in the tree will probably have been shared.

If the trees are related in other ways. The set values in each tree were compiled from similar methods or datasets, then the equivalent nodes wouldn't had the same reference, but equality normalization would make sure a second equality test would succeed immediately though the first may not.

Andrew Arnott

Regarding your optimizations, I still hold that in the general case you can't be better than O(n). Optimizations can certainly bring down the average time, and in fact one earlier version of our immutable collections did in fact use the technique you mentioned to reduce equality checks between two similarly structured sets by recognizing reference equality of the tree nodes.
We removed this functionality (for now) because value equality is hard to define, which we have blog posts discussing.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Your comment could not be posted. Error type:
Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.


Post a comment

My name is Wesner Moise. I am a software entrepreneur developing revolutionary AI desktop applications. I worked as a software engineer in Microsoft Excel group for six years during the 1990s. I worked on PivotTables and wrote the most lines of code in Excel 97-- about 10 times the median developer. I have a Harvard BA in applied math/computer science and a UCLA MBA in technology entrepreneurship. I am a member of the Triple Nine Society, a 99.9 percentile high-IQ society.

December 2013

Sun Mon Tue Wed Thu Fri Sat
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31