Language Integrated Query (LINQ) is a Microsoft .NET Framework component that adds native data querying capabilities to .NET languages. One of the more powerful extensions is .Distinct(), which returns a collection of unique objects - as defined by the comparison method. This has actually led to some ambiguity, but many third parties often follow Microsoft's guidance.
To illustrate the concept, consider the following class:
namespace Testing
{
class TestClass
{
public int A = 0;
public int B = 0;
}
}
Now consider the following experiment...
using System;
using System.Collections.Generic;
using System.Linq;
namespace Testing
{
internal class Program
{
static void Main(string[] args)
{
var list = new List<TestClass>();
for (int i = 0; i < 50; i++)
{
list.Add(new TestClass());
}
Console.WriteLine("Found " + list.Distinct().Count());
Console.ReadKey();
}
}
}
When we run the test initially, it says
Found 50
No surprise there, but let's see if Microsoft's documentation can tell us how to get .Distinct() to return 1 result.
If you want to return distinct elements from sequences of objects of some custom data type, you have to implement the IEquatable<T> generic interface in the class.
So lets try it
class TestClass : IEquatable<TestClass>
{
public int A = 0;
public int B = 0;
#region IEquatable<TestClass> implementation
/// <summary>
/// Checks if an other class object is equal to the instance
/// </summary>
/// <param name="other">The other class object.</param>
/// <returns><c>true</c> if equal, <c>false</c> if otherwise</returns>
public bool Equals(TestClass other)
{
if (Object.ReferenceEquals(other, null)) { return false; }
if (Object.ReferenceEquals(other, this)) { return true; }
return A == other.A && B == other.B;
}
/// <summary>
/// Returns a hash code for this instance.
/// </summary>
/// <remarks>
/// Suitable for use in hashing algorithms and data structures like a hash table.
/// </remarks>
/// <returns>A hash code (integer value) for this instance</returns>
public override int GetHashCode()
{
return (A.GetHashCode()) ^ (B.GetHashCode());
}
#endregion
}
The results:
Found 1
That works! Our test shows that implementing IEquatable<TestClass> allows for correct distinctness. But do we need the interface?
class TestClass
{
public int A = 0;
public int B = 0;
/// <summary>
/// Checks if an other class object is equal to the instance
/// </summary>
/// <param name="other">The other class object.</param>
/// <returns><c>true</c> if equal, <c>false</c> if otherwise</returns>
public bool Equals(TestClass other)
{
if (Object.ReferenceEquals(other, null)) { return false; }
if (Object.ReferenceEquals(other, this)) { return true; }
return A == other.A && B == other.B;
}
/// <summary>
/// Returns a hash code for this instance.
/// </summary>
/// <remarks>
/// Suitable for use in hashing algorithms and data structures like a hash table.
/// </remarks>
/// <returns>A hash code (integer value) for this instance</returns>
public override int GetHashCode()
{
return (A.GetHashCode()) ^ (B.GetHashCode());
}
}
The results:
Found 50
We
class TestClass
{
public int A = 0;
public int B = 0;
/// <summary>
/// Checks if an other class object is equal to the instance
/// </summary>
/// <param name="other">The other class object.</param>
/// <returns><c>true</c> if equal, <c>false</c> if otherwise</returns>
private bool Equals(TestClass other)
{
if (Object.ReferenceEquals(other, null)) { return false; }
if (Object.ReferenceEquals(other, this)) { return true; }
return A == other.A && B == other.B;
}
/// <summary>
/// Returns a hash code for this instance.
/// </summary>
/// <remarks>
/// Suitable for use in hashing algorithms and data structures like a hash table.
/// </remarks>
/// <returns>A hash code (integer value) for this instance</returns>
public override int GetHashCode()
{
return (A.GetHashCode()) ^ (B.GetHashCode());
}
/// <summary>
/// Determines whether the specified <see cref="System.Object" />,
/// is equal to this instance.
/// </summary>
/// <param name="obj">The <see cref="System.Object" /> to compare with this instance.</param>
/// <returns>
/// <c>true</c> if the specified <see cref="System.Object" /> is equal to this instance;
/// otherwise, <c>false</c>.
/// </returns>
public override bool Equals(object obj)
{
if (!(obj is TestClass)) { return false; }
var other = obj as TestClass;
if (other == null) { return false; }
return this.Equals(other);
}
}
The results:
Found 1
It looks like the IEquatable<TestClass> interface is not needed when Equals(object obj)
is overridden.
But what about equality operators?
class TestClass
{
public int A = 0;
public int B = 0;
/// <summary>
/// Checks if an other class object is equal to the instance
/// </summary>
/// <param name="other">The other class object.</param>
/// <returns><c>true</c> if equal, <c>false</c> if otherwise</returns>
private bool Equals(TestClass other)
{
if (Object.ReferenceEquals(other, null)) { return false; }
if (Object.ReferenceEquals(other, this)) { return true; }
return A == other.A && B == other.B;
}
/// <summary>
/// Returns a hash code for this instance.
/// </summary>
/// <remarks>
/// Suitable for use in hashing algorithms and data structures like a hash table.
/// </remarks>
/// <returns>A hash code (integer value) for this instance</returns>
public override int GetHashCode()
{
return (A.GetHashCode()) ^ (B.GetHashCode());
}
/// <summary>
/// Implements the operator == (equals).
/// </summary>
/// <param name="lhs">The left hand side.</param>
/// <param name="rhs">The right hand side.</param>
/// <returns><c>true</c> if equal, <c>false</c> if not equal</returns>
public static bool operator ==(TestClass lhs, TestClass rhs)
=> Object.ReferenceEquals(lhs, null) ?
Object.ReferenceEquals(rhs, null) : lhs.Equals(rhs);
/// <summary>
/// Implements the operator != (not equals)
/// </summary>
/// <param name="lhs">The left hand side.</param>
/// <param name="rhs">The right hand side.</param>
/// <returns><c>true</c> if not equal, <c>false</c> if equal</returns>
public static bool operator !=(TestClass lhs, TestClass rhs)
=> Object.ReferenceEquals(lhs, null) ?
!Object.ReferenceEquals(rhs, null) : !lhs.Equals(rhs);
}
The results:
Found 50
It looks like the equality operator does not help establish distinctness.
So why all this testing?
As it turns out System.Collections.Generic.EqualityComparer<TestClass>.Default
checks for IEquatable<TestClass> and uses .Equals(TestClass obj) before it uses .Equals(object obj).
This is important because under the hood, Distinct() uses System.Collections.Generic.EqualityComparer<TestClass>.Default
for uniqueness if none is specified.
Let's try an experiment
if (System.Collections.Generic.EqualityComparer<TestClass>.Default.Equals(new TestClass(), new TestClass()))
{
Console.WriteLine("Found Equals");
}
else
{
Console.WriteLine("Not Equals");
}
We'll set the Equals(object obj) to return false, and Equals(TestClass obj) to return correctly.
class TestClass : IEquatable<TestClass>
{
public int A = 0;
public int B = 0;
#region IEquatable<TestClass> implementation
/// <summary>
/// Checks if an other class object is equal to the instance
/// </summary>
/// <param name="other">The other class object.</param>
/// <returns><c>true</c> if equal, <c>false</c> if otherwise</returns>
public bool Equals(TestClass other)
{
if (Object.ReferenceEquals(other, null)) { return false; }
if (Object.ReferenceEquals(other, this)) { return true; }
return A == other.A && B == other.B;
}
/// <summary>
/// Returns a hash code for this instance.
/// </summary>
/// <remarks>
/// Suitable for use in hashing algorithms and data structures like a hash table.
/// </remarks>
/// <returns>A hash code (integer value) for this instance</returns>
public override int GetHashCode()
{
return (A.GetHashCode()) ^ (B.GetHashCode());
}
#endregion
/// <summary>
/// Determines whether the specified <see cref="System.Object" />,
/// is equal to this instance.
/// </summary>
/// <param name="obj">The <see cref="System.Object" /> to compare with this instance.</param>
/// <returns>
/// <c>true</c> if the specified <see cref="System.Object" /> is equal to this instance;
/// otherwise, <c>false</c>.
/// </returns>
public override bool Equals(object obj)
{
return false;
}
}
The results:
Found Equals
In conclusion, follow the Microsoft recommendation.
If you want to return distinct elements from sequences of objects of some custom data type, you have to implement the IEquatable<T> generic interface in the class.