What is Hashset in C#?

5 min readAug 18, 2024

In this article we will explore what is Hashset in C#, when and how to use them with examples.

Note: You can try examples shown in this articles using your favorite online C# compiler like ShartLab.io.

What is Hashset?

Hashset allows us to create a collection (just like List, array) with a difference that it automatically makes sure that duplicates values are not allowed in the collection. It also provides high-performance set operations (like union, intersection, and difference). It is part of the System.Collections.Generic namespace. The HashSet<T> class implements the ISet<T> interface, which is a generic collection of unique elements.

Let’s see with an example:

using System;
using System.Collections.Generic;

class Program  
{
    static void Main()
    {
        // Create a new HashSet of strings  
        HashSet<string> fruits = new HashSet<string>();
        
        // Add elements to the HashSet  
        fruits.Add("Apple");
        fruits.Add("Banana");
        fruits.Add("Mango");
        // Attempt to add a duplicate element  
        bool wasAdded = fruits.Add("Apple");

        Console.WriteLine("Was 'Apple' added again? " + wasAdded); 
        // Output: False
        
    }
}

In above example, we created a simple Hashset which will store strings. Later, we added few names of fruits in it. However, the interesting thing happens when we try to add Apple again. The value of the wasAdded will be false. Note that this is built-in functionalities and doesn’t need any additional configuration.

Now, let’s iterate through the fruits and see what Hashset contains.

foreach (string fruit in fruits)
{
    Console.WriteLine(fruit);
}

Output:

Apple
Banana
Mango

Note that, the uniqueness is case-sensitive. Meaning, mango and Mango are different. So, below line will add a new element.

fruits.Add("mango"); // This is not duplicate

How it works behind the scene?

Behind the scenes, HashSet<T> in C# uses a hash table to manage its elements. Here’s a detailed look at how it works:

Hash Function

When you add an element to a HashSet, it uses a hash function to compute an integer hash code for the element. The hash code is used to determine where to store the element in the hash table. Ideally, a good hash function distributes elements uniformly across the table to minimize collisions. That’s the reason why a duplicate won’t be added — as it has same hash code.

Buckets

The hash table is essentially an array of “buckets,” where each bucket can store multiple elements. The position of each bucket is determined by the hash code of the elements it contains.

Handling Collisions

Collisions occur when two different elements have the same hash code. HashSet handles collisions using chaining, where each bucket contains a list (or another structure) of elements that have the same hash code. When a collision occurs, the new element is added to the list in the appropriate bucket.

Operations

Here’s how various operations are performed:

Add (adds an element in the HashSet):

Compute the hash code of the element.
Determine the bucket index using the hash code.
Check if the element already exists in the bucket (to ensure uniqueness).
If not, add the element to the bucket.

2. Contains (Returns true or false based the existence of the element):

Compute the hash code of the element.
Determine the bucket index using the hash code.
Search the bucket for the element.
Return true if the element is found; otherwise, return false.
Here is an example:

// Check if an element exists in the HashSet  
bool containsBanana = fruits.Contains("Banana");
Console.WriteLine("Contains 'Banana': " + containsBanana); 
// Output: True

3. Remove:

Compute the hash code of the element.
Determine the bucket index using the hash code.
Search the bucket for the element.
If found, remove the element from the bucket.
Here is an example:

// Remove an element from the HashSet  
fruits.Remove("Banana");
containsBanana = fruits.Contains("Banana");
Console.WriteLine("Contains 'Banana' after removal: " + containsBanana); 
// Output: False

Automatic Resizing

If the load factor (number of elements / number of buckets) exceeds a certain threshold, the hash table will automatically resize to maintain performance. When resizing, a new, larger array of buckets is created, and all existing elements are rehashed and placed into the new buckets.

Now, let’s look at the high-performanct set operations supported by Hashset.

UnionWith

The UnionWith method modifies the current HashSet<T> to contain all elements that are present in either the current set or the specified collection. Here is an example:

using System;
using System.Collections.Generic;

class Program  
{
    static void Main()
    {
        HashSet<int> set1 = new HashSet<int> { 1, 2, 3 };
        HashSet<int> set2 = new HashSet<int> { 3, 4, 5 };

        // Perform Union operation  
        set1.UnionWith(set2);

        // Output: 1 
        // 2
        // 3 
        // 4 
        // 5  
        foreach (var item in set1)
        {
            Console.WriteLine(item);
        }
    }
}

After set1.UnionWith(set2), set1 is modified to {1, 2, 3, 4, 5}.

Output: 1 2 3 4 5

Note that now Set1 contains elements of Set2, with making sure to not to repeat element 3.

IntersectWith

The IntersectWith method modifies the current HashSet<T> to contain only elements that are present in both the current set and the specified collection.

Here is an example:

using System;
using System.Collections.Generic;

class Program  
{
    static void Main()
    {
        HashSet<int> set1 = new HashSet<int> { 1, 2, 3, 4 };
        HashSet<int> set2 = new HashSet<int> { 3, 4, 5 };

        // Perform Intersection operation  
        set1.IntersectWith(set2);

        // Output: 3
        //         4 
        foreach (var item in set1)
        {
            Console.WriteLine(item);
        }

    }
}

As you can see, the common elements between set1 and set2 is 3 and 4.

ExceptWith

The ExceptWith method modifies the current HashSet<T> to remove all elements that are also in the specified collection, effectively performing a difference operation.

using System;
using System.Collections.Generic;

class Program  
{
    static void Main()
    {
        HashSet<int> set1 = new HashSet<int> { 1, 2, 3, 4 };
        HashSet<int> set2 = new HashSet<int> { 3, 4, 5 };

        // Perform Difference operation  
        set1.ExceptWith(set2);

        // Output: 1
        //         2
        foreach (var item in set1)
        {
            Console.WriteLine(item);
        }
    }
}

As you can see, the element 3 and 4 are common and they removed from set1. This is contradict to our previous method IntersectWith .

Let’s see another real-life example on how to remove duplicates from a List using HashSet.

using System;
using System.Collections.Generic;

class Program  
{
    static void Main()
    {
        // Create a list with duplicate elements  
        List<int> numbersWithDuplicates = new List<int> { 1, 2, 2, 3, 4, 4, 5 };

        // Use a HashSet to remove duplicates  
        HashSet<int> uniqueNumbers = new HashSet<int>(numbersWithDuplicates);

        // Output: 1 
        // 2 
        // 3 
        // 4 
        // 5  
        foreach (var number in uniqueNumbers)
        {
            Console.WriteLine(number);
        }
    }
}

Note that this is considered the fastest way to eliminate duplicates from a list. If we need we can create List from HashSet like:

// Convert the HashSet back to a List if needed  
List<int> numbersWithoutDuplicates = new List<int>(uniqueNumbers);

Conclusion

HashSet<T> in C# is a highly efficient collection for managing unique elements and performing set operations like union, intersection, and difference. It leverages a hash table to ensure O(1) performance for adding, checking, and removing elements, handling collisions through chaining.

Key methods include UnionWith for combining sets, IntersectWith for finding common elements, and ExceptWith for removing specific elements. Additionally, HashSet is an excellent tool for removing duplicates from lists by utilizing its uniqueness property.
In summary, often ignored but HashSet<T> is an essential and versatile tool for developers needing efficient and straightforward set operations and uniqueness enforcement in their collections.

What is Hashset in C#?

What is Hashset?

How it works behind the scene?

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Rupen Anjaria

No responses yet