Iterators, Not Just for Iteration

« C# Behind the Scenes | Main | Nested Blocks »

August 05, 2004

Iterators, Not Just for Iteration

Iterators are a new feature in C# 2.0, significant easing the pain of creating enumerators and enumerable collections to be used by foreach. They also have counterparts in other languages, especially scripting languages, like Ruby and Python. 

I'll assume that you have some familiarity with the basic operation of iterators in C#. The following example in from the initial specification for the v2.0 features, which enumerates numbers incrementally from a range.

static IEnumerable<int> FromTo(int from, int to) {
      while (from <= to) yield return from++;
}

This allows you to write a foreach loop "foreach (int i from FromTo(1, 100))" to iterate over a loop for each value of i in the range from 1 to 100. What the C# compiler does it convert the iterator function to return a class, which implements a state-machine to perform the equivalent operation, as shown below. The compiler also reproduces code from any finally clauses into Dispose method of the IDisposable interface, to handle cases where enumeration is terminated early such as by calling break with a foreach loop. The specification proposes the following transformation for the iterator method FromTo.

   static IEnumerable<int> FromTo(int from, int to) {
      return new __Enumerable1(from, to);
   }

   class __Enumerable1:
      IEnumerable<int>, IEnumerable,
      IEnumerator<int>, IEnumerator
   {
      int __state;
      int __current;
      int __from;
      int from;
      int to;
      int i;

      public __Enumerable1(int __from, int to) {
         this.__from = __from;
         this.to = to;
      }

      public IEnumerator<int> GetEnumerator() {
         __Enumerable1 result = this;
         if (Interlocked.CompareExchange(ref __state, 1, 0) != 0) {
            result = new __Enumerable1(__from, to);
            result.__state = 1;
         }
         result.from = result.__from;
         return result;
      }

      IEnumerator IEnumerable.GetEnumerator() {
         return (IEnumerator)GetEnumerator();
      }

      public int Current {
         get { return __current; }
      }

      object IEnumerator.Current {
         get { return __current; }
      }

      public bool MoveNext() {
         switch (__state) {
         case 1:
            if (from > to) goto case 2;
            __current = from++;
            __state = 1;
            return true;
         case 2:
            __state = 2;
            return false;
         default:
            throw new InvalidOperationException();
         }
      }

      public void Dispose() {
         __state = 2;
      }

      void IEnumerator.Reset() {
         throw new NotSupportedException();
      }
   }

While the specification indicates that this is a possible implementation of the iterator feature, a quick look at reflector indicates this is the actual implementation, or close to it.

Some Advantages of Iterators

  • Iterators support a lazy evaluation by only evaluating and returning values that are needed.
  • Iterators can return an infinite stream of values.
  • Since iterators are state-based, they can return values back to the consumer as they are computer, rather than consuming memory by storing each value in a list.
  • Iterators can alleviate the need for threads when using the Producer-Consumer pattern

Pipeline Pattern

I think that iterators will enable additional patterns of programming. For one thing, we could see more use of a Pipeline pattern (similar to the Producer-Consumer pattern in multi-threading) of programming similar to that used in the command shell, where one iterator takes another enumerable collection as input and outputs another enumerable.

public IEnumerable<int> Times2(IEnumerable<int> input)
{
    foreach( int i in input )
        yield return i * 2;
}

This Pipeline pattern also enables a number of tricks beyond the command-line paradigm, in that the input can be enumerated over multiple times, and does not need to be cached.

It might be worthwhile to create generic iterators that performing filter, sorting, and combining on other iterators such as the following.

public IEnumerable<T> Join<T>(IEnumerable<T> list1, IEnumerable<T> list2)
{
   foreach( T t in list1 )
      yield return t;
   foreach( T t in list2 )
      yield return t;
}

Beyond Iteration

Iterators introduce some novel concepts such as allowing suspended and resumable execution via a state-machine model, and heap-based locals. These two concepts enable other new programming techniques besides enumeration.

The IEnumerator object return by an iterator in essence represents an instruction address (in combination with local state) and a call to MoveNext() represents a subrountine call to that instruction execution.

When writing iterators in some cases, the actual value yielded may be of little value. What we are interested in keeping track of is the current instruction address and state of the iterator. In other words, we care about MoveNext() but not really about Current.

Iterators are in fact an example of a more general class of functions, called coroutines, as opposed to subrountines. Subroutines can return only one value, whereas co-routines can be return multiple values; each invocation of a co-routine, begins where it last returned. Calls between co-routines are more like "gotos" across functions, since coroutines behave like peers. Subrountines, on the other hand, are subordinate to the calling function. Since co-routines require either a separate stack per function or or heap-based storaged, they typically had to be written using threads or fibers.

Simulating Continuations

While I am not clear where the boundaries of co-rountine and more general continuations, as I have only read but not experimented with continuations. I think that the approach that follows might actually provide a suboptimal means of implementing continuations, though much better than Continuation Passing Style.

We start by using a dispatcher and rewriting functions that can yield as iterators that return the IEnumerator interface.

     public void Dispatch(IEnumerator enumerator)
     {
           while (enumerator.MoveNext())
           {
                enumerator = (IEnumerator) enumerator.Current;
           }

     } 

The following illustrates how this would word for a chess game, where two functions White and Black are peers and jump back to one another in the course of the game at the exact location where they previously yielded.

public class Game()
{
      White white;
      Black black;
      bool gameOver;

      public void Go(IEnumerator enumerator)
      {
            gameOver = false;
            white = White();
            black = Black();
             
             Dispatch(white);

      }

     public void Dispatch(IEnumerator enumerator)
     {
           while (enumerator.MoveNext())
                enumerator = (IEnumerator) enumerator.Current;

     } 

     public IEnumerator White()
     {
          while (true)
          {
              MakeWhiteMove();
              if (gameOver) yield break;
              yield return black;
          }
     }

     public IEnumerator Black()
     {
          while (true)
          {
              MakeBlackMove();
              if (gameOver) yield break;
              yield return white;
          }

     }

}

Continuations allow functions deep within the call stack to transfer out control. Calling a function that could yield, from with an iterator, would require mapping a call to Function(args) to the following syntax:

foreach (IEnumerator result in Function(args))
     yield return result;

Note we can't use generic enumerators here, because the generic syntax does not support infinite recursion. That is, IEnumerator<IEnumerator<IEnumerator<....>>>

Actually, the above could be simplified and made more efficient if we allowed Function to take an extra parameter, representing the current enumerator to return back when done. The simplification looks something like this "yield return Function(thisEnumerator, args)." We can't use the "this" parameter to refer to the enumerator, because "this" refers to the object containing the iterator method. The dispatcher could save the previous enumerator within the dispatch loop and Function would need to save that previous enumerator at the beginning of its body to use as its final yield value.

One missing feature is the ability to clone (except through calling the protected method MemberwiseClone() via reflection) the enumerator object returned by the iterator, in which case it would be possible to invoke code multiple times in the same address with exact state.

 

Comments

© 2015 - Wesner P. Moise, LLC. All rights reserved.

free web stats