Sunday, 9 September 2012

C#: IEnumerable

I've been asked a few times by my apprentice groups to explain IEnumerable. Something that in C# we use every day. IEnumerable is an interface, which when implemented supports iteration. There are two interfaces - the non-generic one (for looping over non-generic collections) and the generic one.

Firstly, lets look at the definition of the interfaces.
namespace System.Collections
{
   public interface IEnumerable
   {
      IEnumerator GetEnumerator();
   }
}
namespace System.Collections.Generic
{
   public interface IEnumerable<out T> : IEnumerable
   {
      IEnumerator<T> GetEnumerator();
   }
}
Now if we have a Listand look at the first line of the definition we will see that it implements the interfaces.
public class List<T> : IList<T>, ICollection<T>, IEnumerable<T>, 
                       IList, ICollection, IEnumerable
Let's create an example to play with. We will have a class (called Place) with some fields.
public class Place
{
    public string PlaceName { get; set; }
    public string GaelicName { get; set; }
    public int Population { get; set; }

    public override string ToString()
    {
        return String.Format("Place {0} ({1}) pop {2}",
            PlaceName, GaelicName, Population);
    }
}
And in a console program let us populate an instance of a List of Place (List<Place>).
List<Place> places = new List 
{
    new Place { PlaceName = "Lewis and Harris", GaelicName = "Leòdhas agus na Hearadh", Population = 21031 },
    new Place { PlaceName = "South Uist", GaelicName = "Uibhist a Deas", Population = 1754 },
    new Place { PlaceName = "North Uist", GaelicName = "Uibhist a Tuath", Population = 1254 },
    new Place { PlaceName = "Benbecula", GaelicName = "Beinn nam Fadhla", Population = 1303 },
    new Place { PlaceName = "Barra", GaelicName = "Barraigh", Population = 1174 },
    new Place { PlaceName = "Scalpay", GaelicName = "Sgalpaigh", Population = 291 },
    new Place { PlaceName = "Great Bernera", GaelicName = "Beàrnaraigh Mòr", Population = 252 },
    new Place { PlaceName = "Grimsay", GaelicName = "Griomasaigh", Population = 169 },
    new Place { PlaceName = "Berneray", GaelicName = "Beàrnaraigh", Population = 138 },
    new Place { PlaceName = "Eriskay", GaelicName = "Beàrnaraigh", Population = 143 },
    new Place { PlaceName = "Vatersay", GaelicName = "Bhatarsaigh", Population = 90 },
    new Place { PlaceName = "Baleshare", GaelicName = "Baile Sear", Population = 58 }
};
If we want to output them
foreach (Place place in places)
    Console.WriteLine(place);
And when you run this it will use the ToString() method

So what is exactly happening. Let's look at this code
var enumerator = places.GetEnumerator();

while (a.MoveNext())
{
    Place p = enumerator.Current;
    Console.WriteLine("Place {0}", p);
}
Here we are calling the GetEnumerator() method. Out loop is then checking that we can move to the next element - this MoveNext method will return true if there are (more) elements to process. If there are we can get the current element with the Current method, which we can then print out. When we did our original loop - this is essentially what is happening. The foreach statement in C# will hide this complexity. But foreach will work with classes that implement IEnumerable.

So let's extend our example to add another class (IslandGroup) which we will use to encapsulate details about a group of islands - in this case the list of islands above are the Outer Hebrides. So let's create a class for this, with some properties including a Dictionary containing the islands and one to return the total population. Apologies for the lack of comments - my apprentice groups would crucify me!
public class IslandGroup : IEnumerable<place>
{
    public string IslandGroupName { get; private set; }
    public Dictionary<string,Place> Islands { get; private set; }

    public IslandGroup(string islandGroupName)
    {
        this.IslandGroupName = islandGroupName;
        Islands = new Dictionary<string,Place>();
    }

    public void AddIsland(Place island)
    {
        // Add the island if it isn't in the Islands already
        if (!Islands.ContainsKey(island.PlaceName))
            Islands.Add(island.PlaceName, island);
    }

    public int TotalPopulation
    {
        get
        {
            return Islands.Sum(v => v.Value.Population);
        }
    }
}
We can use this class and populate the dictionary as well as returning the total population of all the islands with something like this
IslandGroup outerHebrides = new IslandGroup("Outer Hebredies");

// Add each island to the class
foreach (var place in places)
    outerHebrides.AddIsland(place);

Console.WriteLine("Population {0}", outerHebrides.TotalPopulation);
Now right click on IEnumerable<Place> on the definition of the class - choose Implement Interface. This will create two methods as below
public IEnumerator<Place> GetEnumerator()
{
    throw new NotImplementedException();
}

System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
    throw new NotImplementedException();
}
Why two methods - well look at the definition of IEnumerable<T> which implements IEnumerable. So you need both. In the end we will make one method (the non-generic GetEnumerator()) call the other.

Have you heard of the yield keyword. The yield keyword is user an iterator method to give control back to the loop. So when you do a foreach loop a method gets called to perform the iteration. In this method we will put in a yield statement. So here is some code
public IEnumerator<Place> GetEnumerator()
{
    foreach (var place in Islands)
        yield return place.Value;
}
Our class encapsulates data for the islands which we store in a Dictionary. In the loop we want to return each place - hence in our loop we are using the .Value property and returning this. We are using yield return <expression>, which every time this gets called the expression will be returned. In our code to execute this we will use the instance of the class
foreach (var item in outerHebrides)
    Console.WriteLine(item);
But are you asking - why don't we just loop through the Dictionary property using something like this
foreach (var item in outerHebrides.Islands)
    Console.WriteLine(item.Value);
It just depends on what data you want to make available and what functionality you want to return. Let's say we want to return (in this case) the islands from lowest population first. Since we have written our own iterator we can do this.
public IEnumerator<Place> GetEnumerator()
{
    var inOrder = from i in Islands.Values
                  orderby i.Population ascending
                  select i;

    foreach (var place in inOrder)
        yield return place;
}
Now when we run this the output will be in the order we determine
Finally, you don't need to implement the interface IEnumerable<T> - you can have methods returning that type. So you could write a method like this
public IEnumerable<Place> GaelicAlphabeticalOrder()
{
    var inOrder = from i in Islands.Values
                  orderby i.GaelicName ascending
                  select i;

    foreach (var place in inOrder)
        yield return place;
}
Which could be executed with
foreach (var place in places)
    outerHebrides.AddIsland(place);
Thus we don't need to implement this at a class level - or if we do, we can provide alternative methods (and these methods could take parameters etc.)

The yield keyword can be used as above, but also as
yield break;
Which will end the iteration. You can also use the yield keyword in static methods, for example
public static IEnumerable<string> ScottishIslandGroups()
{
    yield return "Outer Hebrides";
    yield return "Inner Hebrides";
    yield return "Shetland";
    yield return "Orkney";
    yield return "Islands of the Clyde";
    yield return "Islands of the Forth";
}
Or as a property
public static IEnumerable<string> WelshIslandGroups
{
    get
    {
        yield return "Anglesey";
        yield return "Bristol Channel";
        yield return "Ceredigion";
        yield return "Gower";
        yield return "Gwynedd";
        yield return "Pembrokeshire";
        yield return "St Tudwal's Islands";
        yield return "Vale of Glamorgan";
    }
}
Used as
foreach (string islandGroup in IslandGroup.WelshIslandGroups)
    Console.WriteLine(islandGroup);