What is Hashset in C#?
In this article we will explore what is Hashset in C#, when and how to use them with examples.
Note: You can try examples shown in this articles using your favorite online C# compiler like ShartLab.io.
What is Hashset?
Hashset allows us to create a collection (just like List, array) with a difference that it automatically makes sure that duplicates values are not allowed in the collection. It also provides high-performance set operations (like union
, intersection
, and difference
). It is part of the System.Collections.Generic
namespace. The HashSet<T>
class implements the ISet<T>
interface, which is a generic collection of unique elements.
Let’s see with an example:
using System;
using System.Collections.Generic;
class Program
{
static void Main()
{
// Create a new HashSet of strings
HashSet<string> fruits = new HashSet<string>();
// Add elements to the HashSet
fruits.Add("Apple");
fruits.Add("Banana");
fruits.Add("Mango");
// Attempt to add a duplicate element
bool wasAdded = fruits.Add("Apple");
Console.WriteLine("Was 'Apple' added again? " + wasAdded);
// Output: False
}
}
In above example, we created a simple Hashset which will store strings. Later, we added few names of fruits in it. However, the interesting thing happens when we try to add Apple
again. The value of the wasAdded
will be false
. Note that this is built-in functionalities and doesn’t need any additional configuration.
Now, let’s iterate through the fruits
and see what Hashset contains.
foreach (string fruit in fruits)
{
Console.WriteLine(fruit);
}
Output:
Apple
Banana
Mango
Note that, the uniqueness is case-sensitive. Meaning, mango
and Mango
are different. So, below line will add a new element.
fruits.Add("mango"); // This is not duplicate
How it works behind the scene?
Behind the scenes, HashSet<T> in C# uses a hash table to manage its elements. Here’s a detailed look at how it works:
Hash Function
When you add an element to a HashSet, it uses a hash function to compute an integer hash code for the element. The hash code is used to determine where to store the element in the hash table. Ideally, a good hash function distributes elements uniformly across the table to minimize collisions. That’s the reason why a duplicate won’t be added — as it has same hash code.
Buckets
The hash table is essentially an array of “buckets,” where each bucket can store multiple elements. The position of each bucket is determined by the hash code of the elements it contains.
Handling Collisions
Collisions occur when two different elements have the same hash code. HashSet handles collisions using chaining, where each bucket contains a list (or another structure) of elements that have the same hash code. When a collision occurs, the new element is added to the list in the appropriate bucket.
Operations
Here’s how various operations are performed:
- Add (adds an element in the HashSet):
- Compute the hash code of the element.
- Determine the bucket index using the hash code.
- Check if the element already exists in the bucket (to ensure uniqueness).
- If not, add the element to the bucket.
2. Contains (Returns true or false based the existence of the element):
- Compute the hash code of the element.
- Determine the bucket index using the hash code.
- Search the bucket for the element.
- Return true if the element is found; otherwise, return false.
- Here is an example:
// Check if an element exists in the HashSet
bool containsBanana = fruits.Contains("Banana");
Console.WriteLine("Contains 'Banana': " + containsBanana);
// Output: True
3. Remove:
- Compute the hash code of the element.
- Determine the bucket index using the hash code.
- Search the bucket for the element.
- If found, remove the element from the bucket.
- Here is an example:
// Remove an element from the HashSet
fruits.Remove("Banana");
containsBanana = fruits.Contains("Banana");
Console.WriteLine("Contains 'Banana' after removal: " + containsBanana);
// Output: False
Automatic Resizing
If the load factor (number of elements / number of buckets) exceeds a certain threshold, the hash table will automatically resize to maintain performance. When resizing, a new, larger array of buckets is created, and all existing elements are rehashed and placed into the new buckets.
Now, let’s look at the high-performanct set
operations supported by Hashset.
UnionWith
The UnionWith
method modifies the current HashSet<T> to contain all elements that are present in either the current set or the specified collection. Here is an example:
using System;
using System.Collections.Generic;
class Program
{
static void Main()
{
HashSet<int> set1 = new HashSet<int> { 1, 2, 3 };
HashSet<int> set2 = new HashSet<int> { 3, 4, 5 };
// Perform Union operation
set1.UnionWith(set2);
// Output: 1
// 2
// 3
// 4
// 5
foreach (var item in set1)
{
Console.WriteLine(item);
}
}
}
After set1.UnionWith(set2)
, set1
is modified to {1, 2, 3, 4, 5}.
Output: 1 2 3 4 5
Note that now Set1
contains elements of Set2
, with making sure to not to repeat element 3
.
IntersectWith
The IntersectWith
method modifies the current HashSet<T> to contain only elements that are present in both the current set and the specified collection.
Here is an example:
using System;
using System.Collections.Generic;
class Program
{
static void Main()
{
HashSet<int> set1 = new HashSet<int> { 1, 2, 3, 4 };
HashSet<int> set2 = new HashSet<int> { 3, 4, 5 };
// Perform Intersection operation
set1.IntersectWith(set2);
// Output: 3
// 4
foreach (var item in set1)
{
Console.WriteLine(item);
}
}
}
As you can see, the common elements between set1 and set2 is 3 and 4.
ExceptWith
The ExceptWith
method modifies the current HashSet<T> to remove all elements that are also in the specified collection, effectively performing a difference operation.
using System;
using System.Collections.Generic;
class Program
{
static void Main()
{
HashSet<int> set1 = new HashSet<int> { 1, 2, 3, 4 };
HashSet<int> set2 = new HashSet<int> { 3, 4, 5 };
// Perform Difference operation
set1.ExceptWith(set2);
// Output: 1
// 2
foreach (var item in set1)
{
Console.WriteLine(item);
}
}
}
As you can see, the element 3 and 4 are common and they removed from set1. This is contradict to our previous method IntersectWith
.
Let’s see another real-life example on how to remove duplicates from a List using HashSet.
using System;
using System.Collections.Generic;
class Program
{
static void Main()
{
// Create a list with duplicate elements
List<int> numbersWithDuplicates = new List<int> { 1, 2, 2, 3, 4, 4, 5 };
// Use a HashSet to remove duplicates
HashSet<int> uniqueNumbers = new HashSet<int>(numbersWithDuplicates);
// Output: 1
// 2
// 3
// 4
// 5
foreach (var number in uniqueNumbers)
{
Console.WriteLine(number);
}
}
}
Note that this is considered the fastest way to eliminate duplicates from a list. If we need we can create List from HashSet like:
// Convert the HashSet back to a List if needed
List<int> numbersWithoutDuplicates = new List<int>(uniqueNumbers);
Conclusion
HashSet<T>
in C# is a highly efficient collection for managing unique elements and performing set operations like union
, intersection
, and difference
. It leverages a hash table to ensure O(1) performance for adding, checking, and removing elements, handling collisions through chaining.
Key methods include UnionWith
for combining sets, IntersectWith
for finding common elements, and ExceptWith
for removing specific elements. Additionally, HashSet is an excellent tool for removing duplicates from lists by utilizing its uniqueness property.
In summary, often ignored but HashSet<T>
is an essential and versatile tool for developers needing efficient and straightforward set operations and uniqueness enforcement in their collections.