Why do iterators have deferred execution semantics?

After my post about deferred execution semantics and re-entrancy I’ve had a couple of people ask why iterator code has deferred execution semantics rather than just executing immediately. To understand this we need to see how iterators are actually implemented. Consider the following very simple iterator method which produces a range of integers:

public static IEnumerable<int> Range(int from, int to)
{
    for (int i = from; i <= to; i++)
    {
        yield return i;
    }
}

This code clearly can’t execute ‘as is’ because there are multiple return points from the function, and it doesn’t actually return an enumerable object yet. The compiler makes it work by transforming the code into a private enumerator class which is implemented as a finite state machine. The following class is based on the code retrieved using Reflector although I have changed the naming (because the names generated by the compiler are legal in MSIL but not in C# so it wouldn’t compile otherwise) and converted the state field from an integer to an enumeration for readability:

private sealed class RangeIterator : IEnumerable<int>, IEnumerator<int>
{
    public int iterateFrom;
    public int iterateTo;

    private State state;
    private int current;
    private int initialThreadId;
    private int next;
    private int from;
    private int to;

    public RangeIterator(State state)
    {
        this.state = state;
        this.initialThreadId = Thread.CurrentThread.ManagedThreadId;
    }

    public enum State
    {
        Uninitialized,
        Unstarted,
        Iterating,
        Finished,
    }

    public int Current
    {
        get { return this.current; }
    }

    object IEnumerator.Current
    {
        get { return this.current; }
    }

    public IEnumerator<int> GetEnumerator()
    {
        Program.RangeIterator enumerator;
        if (Thread.CurrentThread.ManagedThreadId == this.initialThreadId && 
            this.state == State.Uninitialized)
        {
            this.state = State.Unstarted;
            enumerator = this;
        }
        else
        {
            enumerator = new RangeIterator(State.Unstarted);
        }

        enumerator.from = this.iterateFrom;
        enumerator.to = this.iterateTo;
        return enumerator;
    }

    public bool MoveNext()
    {
        switch (this.state)
        {
            case State.Unstarted:
                this.state = State.Finished;
                this.next = this.from;
                break;

            case State.Iterating:
                this.state = State.Finished;
                this.next++;
                break;

            default:
                return false;
        }

        if (this.next <= this.to)
        {
            this.current = this.next;
            this.state = State.Iterating;
            return true;
        }

        return false;
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
        return this.GetEnumerator();
    }

    void IEnumerator.Reset()
    {
        throw new NotSupportedException();
    }

    void IDisposable.Dispose()
    {
    }
}

The content of the GetEnumerator method are then translated to return a new iterator in the uninitialized state, with the to/from values set to the parameters:

public static IEnumerable<int> Range(int from, int to)
{
    var iterator = new RangeIterator(RangeIterator.State.Uninitialized);
    iterator.iterateFrom = from;
    iterator.iterateTo = to;
    return iterator;
}

So that’s why the code doesn’t execute when you call an iterator method - because it simply returns an object. When you actually enumerate the returned RangeIterator (typically using a foreach loop) the GetEnumerator method is called. The first time this returns the same RangeIterator instance, and on subsequent calls it returns new instances. Within the returned enumerator it is the MoveNext method that implements the logic of your iterator method, so the execution is deferred until that is called.

This is the same reason that Linq-to-Objects has deferred execution semantics; most of its methods are implemented using iterators.

blog comments powered by Disqus
Fork me on GitHub