Explain the working of disjoint-set data structure and efficiently implement it.

**Problem:** We have some number of items. We are allowed to merge any two items to consider them equal. At any point, we are allowed to ask whether two items are considered equal or not.

**What is a disjoint-set?**

A disjoint-set is a data structure that keeps track of a set of elements partitioned into a number of disjoint (non-overlapping) subsets. In other words, **disjoint set is a group of sets where no item can be in more than one set**. It is also called a union–find data structure as it supports union & find operations on subsets. Lets begin by defining them –

** Find:** It determines in which subset a particular element is in and returns the representative of that particular set. An item from this set typically acts as a “representative” of the set.

** Union:** It merges two different subsets into a single subset and representative of one set becomes representative of other.

The disjoint-set also supports one other important operation called **MakeSet** which creates a set containing only a given element in it.

**How does union–find work?**

We can determine whether two elements are in the same subset or not by comparing the result of two *Find* operations. If the two elements are in the same set, they have same representative else they belong to different sets. If the union is called on two elements, we merge the two subsets which the two elements were belonging to.

**How to implement Disjoint sets?**

**Disjoint-set forests **are data structures where each set is represented by a tree data in which each node holds a reference to its parent node and the representative of each set is the root of that set’s tree.

follows parent nodes until it reaches the root.*Find*combines two trees into one by attaching the root of one tree into the root of the other.*Union*

For example, consider five disjoint sets S1, S2, S3, S4 and S5 that are represented by a tree as shown in below diagram. Each set initially contains only one element each so their parent pointer is pointing to itself or to NULL.

S1 = {1}, S2 ={2}, S3 = {3}, S4 ={4} and S5 = {5}

The *Find* operation on element i will return representative of S^{i} where 1 <= i <= 5. i.e. *Find*(i) = i

If we do *Union* (S3, S4), S3 and S4 will be merged into one disjoint set S3. Now

S1 = {1}, S2 ={2}, S3 = {3, 4} and S5 = {5}.

*Find*(4) will return representative of set S3. i.e. *Find*(4) = 3

If we do *Union* (S1, S2), S1 and S2 will be merged into one disjoint S1. Now

S1 = {1, 2}, S3 = {3, 4} and S5 = {5}.

*Find*(2) or *Find*(1) will return representative of set S1. i.e. *Find*(2) = *Find*(1) = 1

If we do *Union* (S3, S1), S3 and S1 will be merged into one disjoint set S3. Now

S3 = {1, 2, 3, 4} and S5 = {5}.

One way of implementing these might be:

**function** *MakeSet*(x)

x.parent = x

**function** *Find*(x)

if x.parent == x

return x

else

return *Find*(x.parent)

**function** *Union*(x, y)

xRoot = *Find*(x)

yRoot = *Find*(y)

xRoot.parent = yRoot

Below Implementation of Union-Find uses a Hashtable to implement Disjoint set.

**C++ implementation –**

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
#include <bits/stdc++.h> using namespace std; // class to represent a disjoint set class DisjointSet { unordered_map<int, int> parent; public: // perform MakeSet operation void makeSet(vector<int> universe) { // create n disjoint sets (one for each item) for (int i : universe) parent[i] = i; } // Find the root of the set in which element k belongs int Find(int k) { // if k is root if (parent[k] == k) return k; // recurse for parent until we find root return Find(parent[k]); } // Perform Union of two subsets void Union(int a, int b) { // find root of the sets in which elements // x and y belongs int x = Find(a); int y = Find(b); parent[x] = y; } }; void printSets(vector<int> universe, DisjointSet &ds) { for (int i : universe) cout << ds.Find(i) << " "; cout << endl; } // main function int main() { // universe of items vector<int> universe = { 1, 2, 3, 4, 5 }; // initalize DisjointSet class DisjointSet ds; // create singleton set for each element of universe ds.makeSet(universe); printSets(universe, ds); ds.Union(4, 3); // 4 and 3 are in same set printSets(universe, ds); ds.Union(2, 1); // 1 and 2 are in same set printSets(universe, ds); ds.Union(1, 3); // 1, 2, 3, 4 are in same set printSets(universe, ds); return 0; } |

**Output: **

1 2 3 4 5

1 2 3 3 5

1 1 3 3 5

3 3 3 3 5

The above approach is no better than the linked-list approach, because the tree it creates can be highly unbalanced; however, it can be enhanced in two ways.

The first way, called ** union by rank**, is to always

**attach the smaller tree to the root of the larger tree**. Since it is the depth of the tree that affects the running time, the tree with smaller depth gets added under the root of the deeper tree, which only increases the depth if the depths were equal. Single element tree are defined to have a rank of zero, and whenever two trees of the same rank

*r*are united, the rank of the result is

*r*+1. The worst-case running-time improves to

*O*(log

*n*) for the

*Union*or

*Find*operation.

Pseudocode for the improved MakeSet and *Union*:

The second improvement, called **path compression**, is a way of flattening the structure of the tree whenever *Find* is used on it. The idea is that each node visited on the way to a root node may as well be attached directly to the root node; they all share the same representative. To effect this, **as Find recursively traverses up the tree, it changes each node’s parent reference to point to the root that it found.** The resulting tree is much flatter, speeding up future operations not only on these elements but on those referencing them, directly or indirectly. Here is the improved

*Find*:

**function** *MakeSet*(x)

x.parent = x

x.rank = 0

**function** *Union*(x, y)

xRoot = *Find*(x)

yRoot = *Find*(y)

if xRoot == yRoot

return

__// x and y are not already in same set. Merge them.__

if xRoot.rank < yRoot.rank

xRoot.parent = yRoot

else if xRoot.rank > yRoot.rank

yRoot.parent = xRoot

else

yRoot.parent = xRoot

xRoot.rank = xRoot.rank + 1

These two techniques complement each other and running time per operation is effectively a small constant.

**C++ implementation –**

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 |
#include <bits/stdc++.h> using namespace std; // class to represent a disjoint set class DisjointSet { unordered_map<int, int> parent; // stores the depth of trees unordered_map<int, int> rank; public: // perform MakeSet operation void makeSet(vector<int> universe) { // create n disjoint sets (one for each item) for (int i : universe) { parent[i] = i; rank[i] = 0; } } // Find the root of the set in which element k belongs int Find(int k) { // if k is not root if (parent[k] != k) // path compression parent[k] = Find(parent[k]); return parent[k]; } // Perform Union of two subsets void Union(int a, int b) { // find root of the sets in which elements // x and y belongs int x = Find(a); int y = Find(b); // if x and y are present in same set if (x == y) return; // Always attach smaller depth tree under the // root of the deeper tree. if (rank[x] > rank[y]) parent[y] = x; else if (rank[x] < rank[y]) parent[x] = y; else { parent[x] = y; rank[y]++; } } }; void printSets(vector<int> universe, DisjointSet &ds) { for (int i : universe) cout << ds.Find(i) << " "; cout << endl; } // main function int main() { // universe of items vector<int> universe = { 1, 2, 3, 4, 5 }; // initalize DisjointSet class DisjointSet ds; // create singleton set for each element of universe ds.makeSet(universe); printSets(universe, ds); ds.Union(4, 3); // 4 and 3 are in same set printSets(universe, ds); ds.Union(2, 1); // 1 and 2 are in same set printSets(universe, ds); ds.Union(1, 3); // 1, 2, 3, 4 are in same set printSets(universe, ds); return 0; } |

**Output: **

1 2 3 4 5

1 2 3 3 5

1 1 3 3 5

3 3 3 3 5

**Applications of Union Find Algorithm**:

1. Implementing Kruskal’s Algorithm to find the minimum spanning tree of a graph.

2. Detecting Cycle in Undirected graph

**References:**

https://en.wikipedia.org/wiki/Disjoint-set_data_structure

**Thanks for reading.**

Please use ideone or C++ Shell or any other online compiler link to post code in comments.

Like us? Please spread the word and help us grow. Happy coding 🙂