« C# Behind the Scenes | Main | Nested Blocks »

August 05, 2004

Iterators, Not Just for Iteration

Iterators are a new feature in C# 2.0, significant easing the pain of creating enumerators and enumerable collections to be used by foreach. They also have counterparts in other languages, especially scripting languages, like Ruby and Python. 

I'll assume that you have some familiarity with the basic operation of iterators in C#. The following example in from the initial specification for the v2.0 features, which enumerates numbers incrementally from a range.

static IEnumerable<int> FromTo(int from, int to) {
      while (from <= to) yield return from++;
}

This allows you to write a foreach loop "foreach (int i from FromTo(1, 100))" to iterate over a loop for each value of i in the range from 1 to 100. What the C# compiler does it convert the iterator function to return a class, which implements a state-machine to perform the equivalent operation, as shown below. The compiler also reproduces code from any finally clauses into Dispose method of the IDisposable interface, to handle cases where enumeration is terminated early such as by calling break with a foreach loop. The specification proposes the following transformation for the iterator method FromTo.

   static IEnumerable<int> FromTo(int from, int to) {
      return new __Enumerable1(from, to);
   }

   class __Enumerable1:
      IEnumerable<int>, IEnumerable,
      IEnumerator<int>, IEnumerator
   {
      int __state;
      int __current;
      int __from;
      int from;
      int to;
      int i;

      public __Enumerable1(int __from, int to) {
         this.__from = __from;
         this.to = to;
      }

      public IEnumerator<int> GetEnumerator() {
         __Enumerable1 result = this;
         if (Interlocked.CompareExchange(ref __state, 1, 0) != 0) {
            result = new __Enumerable1(__from, to);
            result.__state = 1;
         }
         result.from = result.__from;
         return result;
      }

      IEnumerator IEnumerable.GetEnumerator() {
         return (IEnumerator)GetEnumerator();
      }

      public int Current {
         get { return __current; }
      }

      object IEnumerator.Current {
         get { return __current; }
      }

      public bool MoveNext() {
         switch (__state) {
         case 1:
            if (from > to) goto case 2;
            __current = from++;
            __state = 1;
            return true;
         case 2:
            __state = 2;
            return false;
         default:
            throw new InvalidOperationException();
         }
      }

      public void Dispose() {
         __state = 2;
      }

      void IEnumerator.Reset() {
         throw new NotSupportedException();
      }
   }

While the specification indicates that this is a possible implementation of the iterator feature, a quick look at reflector indicates this is the actual implementation, or close to it.

Some Advantages of Iterators

  • Iterators support a lazy evaluation by only evaluating and returning values that are needed.
  • Iterators can return an infinite stream of values.
  • Since iterators are state-based, they can return values back to the consumer as they are computer, rather than consuming memory by storing each value in a list.
  • Iterators can alleviate the need for threads when using the Producer-Consumer pattern

Pipeline Pattern

I think that iterators will enable additional patterns of programming. For one thing, we could see more use of a Pipeline pattern (similar to the Producer-Consumer pattern in multi-threading) of programming similar to that used in the command shell, where one iterator takes another enumerable collection as input and outputs another enumerable.

public IEnumerable<int> Times2(IEnumerable<int> input)
{
    foreach( int i in input )
        yield return i * 2;
}

This Pipeline pattern also enables a number of tricks beyond the command-line paradigm, in that the input can be enumerated over multiple times, and does not need to be cached.

It might be worthwhile to create generic iterators that performing filter, sorting, and combining on other iterators such as the following.

public IEnumerable<T> Join<T>(IEnumerable<T> list1, IEnumerable<T> list2)
{
   foreach( T t in list1 )
      yield return t;
   foreach( T t in list2 )
      yield return t;
}

Beyond Iteration

Iterators introduce some novel concepts such as allowing suspended and resumable execution via a state-machine model, and heap-based locals. These two concepts enable other new programming techniques besides enumeration.

The IEnumerator object return by an iterator in essence represents an instruction address (in combination with local state) and a call to MoveNext() represents a subrountine call to that instruction execution.

When writing iterators in some cases, the actual value yielded may be of little value. What we are interested in keeping track of is the current instruction address and state of the iterator. In other words, we care about MoveNext() but not really about Current.

Iterators are in fact an example of a more general class of functions, called coroutines, as opposed to subrountines. Subroutines can return only one value, whereas co-routines can be return multiple values; each invocation of a co-routine, begins where it last returned. Calls between co-routines are more like "gotos" across functions, since coroutines behave like peers. Subrountines, on the other hand, are subordinate to the calling function. Since co-routines require either a separate stack per function or or heap-based storaged, they typically had to be written using threads or fibers.

Simulating Continuations

While I am not clear where the boundaries of co-rountine and more general continuations, as I have only read but not experimented with continuations. I think that the approach that follows might actually provide a suboptimal means of implementing continuations, though much better than Continuation Passing Style.

We start by using a dispatcher and rewriting functions that can yield as iterators that return the IEnumerator interface.

     public void Dispatch(IEnumerator enumerator)
     {
           while (enumerator.MoveNext())
           {
                enumerator = (IEnumerator) enumerator.Current;
           }

     } 

The following illustrates how this would word for a chess game, where two functions White and Black are peers and jump back to one another in the course of the game at the exact location where they previously yielded.

public class Game()
{
      White white;
      Black black;
      bool gameOver;

      public void Go(IEnumerator enumerator)
      {
            gameOver = false;
            white = White();
            black = Black();
             
             Dispatch(white);

      }

     public void Dispatch(IEnumerator enumerator)
     {
           while (enumerator.MoveNext())
                enumerator = (IEnumerator) enumerator.Current;

     } 

     public IEnumerator White()
     {
          while (true)
          {
              MakeWhiteMove();
              if (gameOver) yield break;
              yield return black;
          }
     }

     public IEnumerator Black()
     {
          while (true)
          {
              MakeBlackMove();
              if (gameOver) yield break;
              yield return white;
          }

     }

}

Continuations allow functions deep within the call stack to transfer out control. Calling a function that could yield, from with an iterator, would require mapping a call to Function(args) to the following syntax:

foreach (IEnumerator result in Function(args))
     yield return result;

Note we can't use generic enumerators here, because the generic syntax does not support infinite recursion. That is, IEnumerator<IEnumerator<IEnumerator<....>>>

Actually, the above could be simplified and made more efficient if we allowed Function to take an extra parameter, representing the current enumerator to return back when done. The simplification looks something like this "yield return Function(thisEnumerator, args)." We can't use the "this" parameter to refer to the enumerator, because "this" refers to the object containing the iterator method. The dispatcher could save the previous enumerator within the dispatch loop and Function would need to save that previous enumerator at the beginning of its body to use as its final yield value.

One missing feature is the ability to clone (except through calling the protected method MemberwiseClone() via reflection) the enumerator object returned by the iterator, in which case it would be possible to invoke code multiple times in the same address with exact state.

 

Comments

Great blog entry. Also, it is weird that I am currently writing a general purpose m-stage pipeline with the same behavior.

Very nice article. Thanks. I was planning to start something along that line myself but didn't find the time until now. About continuations, there is quite a number of interesting articles on Christian Queinnec's pages (and of course, but you probably know that, the chapters in "Structure and Interpretation of Computer Programs" by Abelson and Sussman). Christian has written some nice articles showing how continuations can be useful in a web context. And finally, googling for "Luc Moreau" + scheme and continuations should also bring you a lot of interesting material.

nice entry, thanks.
Actually, IIUC iterators in c# are semicoroutine in that tranfer of data is allowed just in one sense (i.e. you have a yield on one side and not on the other. )

About continuation the main difference is that I don't think you /can/ simulate conts with coroutines, cause the formers are more general than the latter.
It's like simulating an if with a while :)

See http://www.c2.com/cgi/wiki?CoRoutine

Hmmm... I was just reading a posting where someone was looking for a tokenizer in the .net framework rather than split.

It seems that an iterator would make a great tokenizer with the abilty to return each token and retain state to find the next token.

Slightly off topic... Any idea why the C# folks decided to go with 'yield return' instead of just 'yield' for the usual call? It seems so ugly, so I'd be curious of the rationale behind it.

It allows yield to be a context sensitive keyword(I think yield return is a spaced keyword, but the effects are the same), basically. By using yield return the compiler doesn't have to add any new keywords, so these lines
int yield;
yield crop;
yield+=10;
crop.Whatever()
are still legal because yield isn't immediatly followed by return. None of the keywords introduced in C# 2.0 are breaking. global has to be postfixed with ::, alias only works in the using directive section, where only works in the constraints section, partial is only valid as a type modifier, etc.

The syntax was just "yield" in earlier builds. However the change to "yield return" also made possible "yield break", which is handy because it stops iteration (MoveNext returns false). This lets you bail out of iteration if you decide you have nothing left to give. For example, the function below will only return 1 (and in fact the compiler will complain that unreachable code is detected).

IEnumerableint Sequence()
{
yield return 1; //return 1
yield break; //abort! (return false from MoveNext)
yield return 2; //never executes

}

Actually, according to Joval Lowy, "yield" conflicted with existing variable names in financial applications.

Yield was a reserved word in C# 1.0. Mr. Whittington's statement is correct.

The comments to this entry are closed.