Recursive Iterators Performance
I found an paper on nested/recursive iterators by Bart Jacobs, Erik Meijer, et al.
Erik Meijer was involved in the COmega project, which directly led to LINQ-based extensions in C# 3.0 and VB 9.0.
COmega had sequences, which were flattened versions of IEnumerable collections. Sequences introduced a new notion of cardinality into the type systems. A type T could by nullable ( T? ), nonnullable (T!), singleton (T), sequence ( T* ) or nonnullable sequence ( T+ ). A type of one cardinality could be implicitly converted to a type of another cardinality, provided it makes sense, so a singleton was automatically a sequence.
Sequences are fascinating, but would have introduced additional complexity such as major changes to the type system and to syntax would be required. The only real features that sequences added over enumerable collections were flattening and implicit conversions from non-sequences to sequences.
I wasn’t convinced of the rationale for flattening. (Sequence flattening is a major concept in XQuery and XPath 2.0.) They seemed to be useful for type lifting, but the compiler could always special-case type-lifting of collections, so flattening wasn’t strict necessary. Flattening corresponds to the use of SelectMany versus Select operator in LINQ.
The paper did reveal another potential advantage of sequences in their ability to offer linear time enumerations of recursive iterators. The paper remarked that the existing implementation of C# iterators tend to lead to quadratic worst-case running time for recursive iteration of trees. The authors provided an implementation of sequences in the paper that could eliminate that cost.
I was concerned about that paper because I use recursive iterators rather frequently. I noted in an earlier post that recursive iterators can run in linearithmic (n log n) time for traversal of binary trees. I thought the authors of the paper were mistaken, until I noticed a chart clearly demonstrating empirically quadratic time behavior.
Initially, I couldn’t explain why recursive iterators must lead to quadratic time behavior. Traversal time should be proportional to the size of the collection times the average call depth; for a balanced binary tree, the depth would be a logarithic function of n. Then, I realized that the authors used an n-ary tree and more complicated iterator for traversing their tree.
So, I ran a few of my own tests on the recursive iterator in my earlier post, worked some numbers in Excel and produced a chart:
The X axis refers to the collection time N and the Y axis refers to total time spent recursively iterating through the binary-tree collection. The data points are fitted with a cubic polynomial; the straight line regression curve is given in white. I did some analysis. The curve fits the function y = 583 * n * log(n) very neatly. Correlation was 99.9% for a linearithmic curve versus 95.4% for a quadratic curve — about 50 times better.