Wednesday, 13 February 2013

C# Part 3: When are objects the same? Exploration of Equals, GetHashCode and IEquatable

The first two parts exploring equality looked at
This part will look at what GetHashCode() does. Quick summary first, we have this class
public class Employee : IEquatable<employee>
{
    private string employeeName;
    private int employeeNumber;

    public Employee(string employeeName, int employeeNumber)
    {
        this.employeeName = employeeName;
        this.employeeNumber = employeeNumber;
    }

    public override bool Equals(object obj)
    {
        if (obj == null)
            return false;

        Employee other = obj as Employee;
        if (other == null)
            return false;
 
        if (this.employeeNumber == other.employeeNumber)
            return true;
        else
            return false;
    }

    public bool Equals(Employee obj)
    {
        if (obj == null)
            return false;

        if (this.employeeNumber == obj.employeeNumber)
            return true;
        else
            return false;
    }
}
As we have overridden the Equals method (from System.Object) we are now getting a compilation working about GetHashCode()
"'Employee' overrides Object.Equals(object o) but does not override Object.GetHashCode()"
Anytime that we override Equals we should override GetHashCode(). But what does GetHashCode() do. Let's create a test - we aren't actually going to put a test within it, but do some "debugging".
[TestMethod]
public void MyTestMethod()
{
    TestContext.WriteLine("HashCode andrew {0}", andrew.GetHashCode());
    TestContext.WriteLine("HashCode rhona {0}", rhona.GetHashCode());
    TestContext.WriteLine("HashCode rhonda {0}", rhonda.GetHashCode());
}
We are using the TestContext to get some output. Run just this test, then click on the Test in the Test Results. You should see
The entries for all three object have a different "Hash number". The hash number is simply an integer. Before looking at the GetHashCode() method let's create another test using a HashSet. A HashSet contains a set of values that contain no duplicates. Firstly, let's create a HashSet of integers, where we add the same number twice. Our expectation will be that it will not be added to the HashSet a second time. So with a test we have
[TestMethod]
public void HashSetofIntegerTest()
{
    HashSet<int> numbers = new HashSet<int>();
    numbers.Add(1);
    numbers.Add(2);
    numbers.Add(3);
    numbers.Add(1);

    Assert.AreEqual(3, numbers.Count);
}
And when run it passes - despite making four calls to Add we only three elements. We don't get an error when we add - although the Add method returns a bool which when true is returned the element has added (or false if it is already in the set). Now let's do this with a HashSet<Employee>
[TestMethod]
public void HashSetOfEmployeeTest()
{
    HashSet<employee> workers = new HashSet<employee>();
    workers.Add(andrew);
    workers.Add(rhonda);
    workers.Add(rhona);

    Assert.AreEqual(2, workers.Count);
}
Running this test fails - all three employees have been added. However, we want only two people in it (as rhonda and rhona should be the the same person). If you step through the add methods it will just step to the next Add statement. The Add method is not calling our Equals method - when adding to our HashCode that only should have unique entries is not checking what is in the HashSet. Or is it? Now add a GetHashCode method (type public override and then select GetHashCode()). But leave the method with the default one created
public override int GetHashCode()
{
    return base.GetHashCode();
}
Now when you step through this the first thing (in the Employee class) that it does is call GetHashCode(). In the window for Call Stack - right click and you can choose "Show External Code". This will show the stack when the Add method is run, showing a call to a method "AddIfNotPresent" which then retrieves the HashCode.

So each time Add is being called GetHashCode is executed - but our Equals (methods) have not being executed. A Hash code is a numeric value which can be used to determine if two objects are not the same - but is can't be used to tell if two items are the same. Hash functions can be used for indexes. The rules are
  • Two objects that are the same should return the same hash code
  • Two objects that return the same hash code are not necessarily the same object
  • GetHashCode() should be consistent and return the same hash code for the same data
So in this example, objects rhona and rhonda should return the same hash code. Object andrew could still return the same as rhona and rhonda - but this doesn't mean it is the same object. If we change GetHashCode to return the same value, we can see what it does then. So GetHashCode() will look like
public override int GetHashCode()
{   
    return 1;
}
Now run our test of the HashSet - it passes. There are only two objects in it. Now if you step through the code you will see
  • GetHashCode()is called for the first object (andrew). The object will be added to the HashSet (the Count property will increment to 1).
  • GetHashCode()is called for the second object (rhonda). Then the Equals method is called using the object for andrew with the "other" object being "rhonda". The two objects don't match so the object will be added to the HashSet (the Count property will increment to 2).
  • GetHashCode()is called for the third object (rhona), and then Equals is called using the object for rhonda with the "other" object being "rhona".  The Equals method returns true - so the object isn't added.
In this case GetHashCode() is returning 1 for all objects. This isn't ideal - if there are lots of objects in our HashSet the Equals will be run for everyone. Each time an object is added GetHashCode()is called for that object and an index (of some sort) is created. Any object that returns a hash code that isn't in this index can be added to the HashSet without running the Equals method - we know if the hash code doesn't exist it is a different object. So returning just 1 isn't ideal. We need to return something that can identifies the employee number, such as
public override int GetHashCode()
{
    return this.employeeNumber.GetHashCode();
}
Check that the test still passes. And if we go back to our code to output the HashCodes we see that rhona and rhonda both return the same value

Sunday, 10 February 2013

C# Part 2: When are objects the same? Exploration of Equals, GetHashCode and IEquatable

Following on from the previous post which looked at the Equals method overridden from System.Object I will now look at using the alternative version of Equals. So far we have a class Employee which looks like this
public class Employee 
{
    private string employeeName;
    private int employeeNumber;

    public Employee(string employeeName, int employeeN
    {
        this.employeeName = employeeName;
        this.employeeNumber = employeeNumber;
    }

    public override bool Equals(object obj)
    {
        if (obj == null)
            return false;

        Employee other = obj as Employee;
        if (other == null)
            return false;

        if (this.employeeNumber == other.employeeNumbe
            return true;
        else
            return false;
    }

    public bool Equals(Employee obj)
    {
        if (obj == null)
            return false;

        if (this.employeeNumber == obj.employeeNumber)
            return true;
        else
            return false;
    }
}
We wrote some tests, the ones we are interested in this post are
[TestMethod]
public void RhonaIsRhondaTest()
{
    Assert.AreEqual(rhona, rhonda);
}

[TestMethod]
public void RhonaIsRhondaUsingEqualsTest()
{
    Assert.IsTrue(rhona.Equals(rhonda));
}
Both methods pass. However, the first test (using Assert.AreEqual) will execute the Equals method with an object as parameter - the one that overrides the virtual method in System.Object.

In our next test we are going to put two of our employees into a List (andrew and rhonda) and then see if rhona is in the list? Which it should be?
[TestMethod]
public void ContainsInListTest()
{
    List workers = new List<Employee>();
    workers.Add(andrew);
    workers.Add(rhonda);

    Assert.IsTrue(workers.Contains(rhona));
}
The test passes - but if you then run through the debugger it is calling the Equals method inherited from System.Object. Let us now have our Employee class implement the interface IEquatable (or IEquatable<Employee>). This interface is defined as
public interface IEquatable<T>
{
    bool Equals(T other);
}
So the method we have created for Equals(Employee obj) matches this signature. Check the three tests above still pass. The only test that has changed is the call to ContainsTo in the list - which now calls the method as defined by IEquatable. This interface is used by generic collections (such as List).

The next post will look at GetHashCode() and using a HashSet<T>.

Saturday, 2 February 2013

C# Part 1: When are objects the same? Exploration of Equals, GetHashCode and IEquatable

When I teach inheritance to an apprentice group we talk about overriding the ToString() method from System.Object. But I also get asked about overriding Equals (and then the subsequent need to override GetHashCode). This is a little out of the scope of the teaching - what is happening with overriding ToString() is difficult enough. Discussing what is equality is another step, which hopefully I can address here.

I am also going to use unit tests to prove/disprove what we think is going on.

We think of things that are equal contain the same values - or have something within them to make them equal. However, equality (for reference types) generally by default means that it is the same object (the string class is different - two strings are the same if they have the same value, but although strings are a reference type they generally work like a value type).

Firstly let's create a class definition for an employee that has two fields - the employee's name and the employee's number (which for the sake of this example is able to uniquely identify the employee).
public class Employee
{
    private string employeeName;
    private int employeeNumber;

    public Employee(string employeeName, int employeeNumber)
    {
        this.employeeName = employeeName;
        this.employeeNumber = employeeNumber;
    }
}
Now create a unit test (right click on the class and choose Create Unit Tests and follow the wizard).
Let's declare some fields and initialise them in the MyTestInitialize method (in the commented and hidden "Additional test attributes" section).
private Employee andrew;
private Employee rhona;
private Employee rhonda;

[TestInitialize()]
public void MyTestInitialize()
{
    andrew = new Employee("andrew", 2521);
    rhona = new Employee("rhona", 2791);
    rhonda = new Employee("rhonda", 2791);
}
Now employees rhona and rhonda are the same employee - just Rhona's name has been misspelt. But the employee numbers match. Let's write some tests to say that andrew isn't rhona or rhonda but rhona is the same as rhonda.
[TestMethod]
public void AndrewIsntRhonaOrRhondaTest()
{
    Assert.AreNotEqual(andrew, rhona);
    Assert.AreNotEqual(andrew, rhonda);
}
The first test method we write passes (as expected). Before we write some tests to check rhona against rhonda - let's just confirm that the references are as expected. In this case rhona and rhonda are separate objects, as well as just showing that if we copy a reference to an object that this is the same. Here we will use System.Object.ReferenceEquals method.
[TestMethod]
public void CheckReferencesAreAsExpectedTest()
{
    Employee e = andrew;
    Assert.AreEqual(e, andrew);
    Assert.IsTrue(Object.ReferenceEquals(e, andrew));
    Assert.IsFalse(Object.ReferenceEquals(rhona, rhonda));
}
Run this test and it works. Now what about Rhona and Rhonda - we want them to be the same person, so lets write a test for this
[TestMethod]
public void RhonaIsRhondaTest()
{
    Assert.AreEqual(rhona, rhonda);
}
But this test fails - as expected. The objects rhona and rhonda are separate objects. We wish them to be treated as if they are equal. Before we do this there are a couple of ways that we can do this - using the Equals method and also using the "==" operator (which we won't worry about). So lets write a test to check. Hopefully they should all fail. (It may well be that Assert.AreEqual as above calls Equals, but I'm not sure...)
[TestMethod]
public void RhonaIsRhondaUsingEqualsTest()
{
    Assert.IsTrue(rhona.Equals(rhonda));
}
Now the tests for Rhona being Rhonda fail - as expected. We need to write some code. Any class that we want to use to represent a value should override Equals. Go back to the class and add a method to override Equals - if you type public override the intellisense will then be given a list of methods you can override. By default the code looks like.
public override bool Equals(object obj)
{
    return base.Equals(obj);
}
If you try to compile you will receive a warning
"'Employee' overrides Object.Equals(object o) but does not override Object.GetHashCode()".
For the moment we are going to ignore this (and explain this later). The Equals method takes one parameter (obj) and returns a bool. When overriding Equals one thing you must ensure is that if you compare the current instance to the other instance and the other instance is null, it should return false. So first thing, lets write a test
[TestMethod]
public void EqualsComparedToNullTest()
{
    Assert.IsFalse(rhona.Equals(null));
}
And this works (before we change anything). Before we do write some code - let's examine the method signature of Equals. What type is the parameter? - it is object, not Employee. This means that we are going to need to check that our object is an Employee. Again a test for this - lets compare our Employee to a different type (in this case I'll use the EmployeeTest class)
[TestMethod]
public void EqualsComparedToAnotherType()
{
    Assert.IsFalse(rhona.Equals(new EmployeeTest()));
}
Now with that test working, lets finally write some code for Equals.
public override bool Equals(object obj)
{
    if (obj == null)
        return false;

    Employee other = obj as Employee;
    if (other == null)
        return false;

    if (this.employeeNumber == other.employeeNumber)
        return true;
    else
        return false;
}
Let's add another Equals method - this one will take an Employee as a parameter.
public bool Equals(Employee obj)
{
    if (obj == null)
        return false;

    if (this.employeeNumber == obj.employeeNumber)
        return true;
    else
        return false;
}
Wouldn't it be easier if this method is called - there is no casting here. And for all of the tests (bar the one we are comparing to a completely other type) we are comparing to an Employee. If you run the tests in the debugger and step through them you will see that the Assert.AreEquals(rhona, rhonda) calls the Equals with object as a parameter, but Assert.IsTrue(rhona.Equals(rhonda)) calls the Equals method with an Employee as parameter. The former calls a static method in object
public static bool Equals(object objA, object objB);
The later is calling our Equals method directly. But more on that later - it is useful! We need to look at how we might use our Employee (e.g. in Lists etc.)

So a few things to deal with, which will be in subsequent posts