kruskal's algorithm in C++

Trees are essential in the field of computer science and data structures for effectively organizing and managing data. In real-world applications, trees are hierarchical structures that are used to depict a variety of connections and hierarchies. They are the cornerstone of computer science since they are essential to algorithms and data processing. We shall explore the fundamental ideas, various kinds, and useful uses of trees in data structures in this post.

History

Kruskal's algorithm is a fundamental algorithm in graph theory and computer science for finding a minimum spanning tree in a connected, undirected graph. A minimum spanning tree is a subset of the graph's edges that forms a tree, connecting all the vertices with the minimum possible total edge weight. This algorithm was developed by Joseph Kruskal in 1956 while he was a student at Harvard University.

The algorithm's history can be traced back to the mid-20th century when the field of computer science was in its infancy. Joseph Kruskal, along with his advisor Samuel Winograd, was interested in optimizing network design, and this led to the development of the algorithm. Kruskal's work was influenced by earlier research in the area of minimum spanning trees, particularly the work of Czech mathematician Otakar Bor?vka.

Kruskal's algorithm is a greedy approach to finding the minimum spanning tree. It starts with an empty set of edges and iteratively adds edges to the set while ensuring that no cycles are formed. The edges are added in ascending order of their weights, and only those edges that do not create a cycle in the growing forest are included. This process continues until all vertices are connected, forming a minimum spanning tree.

The algorithm's simplicity and efficiency have made it a cornerstone in the field of network design and optimization. It has a time complexity of O(E log E), where E is the number of edges in the graph, which makes it efficient for a wide range of practical applications. Kruskal's algorithm is widely used in computer networking, transportation planning, and various other domains where the efficient construction of minimum spanning trees is essential.

Kruskal's algorithm is a pivotal development in the history of computer science and graph theory, with roots in the mid-20th century. Its elegant and efficient approach to finding minimum spanning trees has made it a foundational tool for solving a wide array of real-world problems, and its legacy continues to influence algorithm design and optimization in various fields.

Understanding the Basics

Let's establish a few basic ideas before delving into the details of trees:

Nodes: Nodes are the basic constituent parts of trees. Each node has data and 0-node(s) child(ren) nodes. The top node is known as the "root," whereas nodes without offspring are known as "leaves."
Edges: In a tree, edges bind the nodes together. They show which nodes are parent and child nodes and the relationships between the nodes.
Hierarchy: Trees are hierarchical structures, which mean they adhere to a predetermined hierarchy. With the root node at the top and several layers of child nodes below it, nodes are arranged in a hierarchy.

Types of Trees

Different types of trees exist, each created to serve a particular function or address a particular issue. Typical tree species include:

Binary Tree: Also known as the left child and the right child, a binary tree allows each node to have a maximum of two offspring. Binary search trees and expression trees are only two examples of the many uses for binary trees.
Binary Search Tree (BST): A binary search tree is a kind of binary tree in which the nodes are arranged so that searching, insertion, and deletion operations may be carried out quickly. Nodes to the left and right of a parent node have values that are more and less than the parent, respectively.
AVL Tree: An AVL tree is a binary search tree that self-balances. By guaranteeing that every node's left and right subtrees are at most one height apart, it keeps the structure balanced. This harmony makes searching activities effective.
Red-Black Tree: Another self-balancing binary search tree is the red-black tree. It maintains a balanced structure using a set of principles, making it appropriate for a variety of applications.
B-Tree: Multi-way trees known as B-trees are frequently utilized in file systems and databases. Through the maintenance of a balanced structure with a variable number of child nodes, they offer effective data storage and retrieval.
Trie: A trie (pronounced "try") is a data structure that resembles a tree and is used to store and search a dynamic set of strings, such as words in a dictionary or IP addresses in a router.

Practical Applications

Let's look at some actual uses for trees in the real world now that we've looked at their fundamental principles and varieties:

File Systems: To organize files and directories, file systems, like the one on your computer, frequently employ tree topologies. A hierarchical structure may be formed by each directory's ability to house files and subdirectories.
Database Systems: To store and retrieve data effectively, many database systems employ B-trees and other tree architectures. Even for huge datasets, these structures provide rapid access to the data.
Network Routing: To choose the best route for data packets to go from their source to their destination, routers in computer networks employ tree-based algorithms.
Abstract Syntax Trees (ASTs) are used by compilers to represent the structure of program code. ASTs are essential for code generation and parsing.
Organizational Hierarchies: In a firm, organizational hierarchies may be represented as trees, with each node denoting a department or employee and their corresponding hierarchical connections.
Game production: Trees are used to organize game elements, their interactions, and their behaviors in video game production. Behavior trees are a typical illustration of this use.

Understanding Greedy Algorithms: Optimizing Choices Step by Step

In the realm of computer science and optimization, one powerful and intuitive technique that often comes to the forefront is the Greedy Algorithm. Greedy algorithms are versatile problem-solving strategies that make decisions at each step, aiming to maximize or minimize a specific objective function. These algorithms are simple in concept but can be highly effective in a wide range of applications. In this article, we will delve into the world of greedy algorithms, exploring their core principles, advantages, and limitations.

What is a Greedy Algorithm?

At its core, a greedy algorithm makes a series of locally optimal choices to reach a globally optimal solution. It operates by selecting the best option at each step without considering the overall consequences. This myopic approach can be likened to a person who consistently chooses the immediate best option at each decision point, hoping to reach the best overall outcome.

Key Characteristics of Greedy Algorithms

1. Greedy Choice Property

The fundamental feature of a greedy algorithm is its "greedy choice property." At each step, the algorithm selects the option that appears to be the best choice at that particular moment, regardless of the bigger picture. The choice is made based on a specific criterion, which may involve maximizing or minimizing a certain value.

2. Optimal Substructure

Greedy algorithms rely on the concept of "optimal substructure," meaning that solving a smaller sub problem optimally contributes to solving the larger problem optimally. In other words, the problem can be divided into smaller, manageable sub problems that are themselves solvable using the same greedy approach.

Below is the implementation of Kruskal's Algorithm in C++:

// C++ program for Kruskal's algorithm to find Minimum 
// Spanning Tree of a given connected, undirected and 
// weighted graph 
#include<bits/stdc++.h> 
using namespace std; 

// Creating shortcut for an integer pair 
typedef pair<int, int> iPair; 

// Structure to represent a graph 
struct Graph 
{ 
	int V, E; 
	vector< pair<int, iPair> > edges; 

	// Constructor 
	Graph(int V, int E) 
	{ 
		this->V = V; 
		this->E = E; 
	} 

	// Utility function to add an edge 
	void addEdge(int u, int v, int w) 
	{ 
		edges.push_back({w, {u, v}}); 
	} 

	// Function to find MST using Kruskal's 
	// MST algorithm 
	int kruskalMST(); 
}; 

// To represent Disjoint Sets 
struct DisjointSets 
{ 
	int *parent, *rnk; 
	int n; 

	// Constructor. 
	DisjointSets(int n) 
	{ 
		// Allocate memory 
		this->n = n; 
		parent = new int[n+1]; 
		rnk = new int[n+1]; 

		// Initially, all vertices are in 
		// different sets and have rank 0. 
		for (int i = 0; i <= n; i++) 
		{ 
			rnk[i] = 0; 

			//every element is parent of itself 
			parent[i] = i; 
		} 
	} 

	// Find the parent of a node 'u' 
	// Path Compression 
	int find(int u) 
	{ 
		/* Make the parent of the nodes in the path 
		from u--> parent[u] point to parent[u] */
		if (u != parent[u]) 
			parent[u] = find(parent[u]); 
		return parent[u]; 
	} 

	// Union by rank 
	void merge(int x, int y) 
	{ 
		x = find(x), y = find(y); 

		/* Make tree with smaller height 
		a subtree of the other tree */
		if (rnk[x] > rnk[y]) 
			parent[y] = x; 
		else // If rnk[x] <= rnk[y] 
			parent[x] = y; 

		if (rnk[x] == rnk[y]) 
			rnk[y]++; 
	} 
}; 

/* Functions returns weight of the MST*/

int Graph::kruskalMST() 
{ 
	int mst_wt = 0; // Initialize result 

	// Sort edges in increasing order on basis of cost 
	sort(edges.begin(), edges.end()); 

	// Create disjoint sets 
	DisjointSets ds(V); 

	// Iterate through all sorted edges 
	vector< pair<int, iPair> >::iterator it; 
	for (it=edges.begin(); it!=edges.end(); it++) 
	{ 
		int u = it->second.first; 
		int v = it->second.second; 

		int set_u = ds.find(u); 
		int set_v = ds.find(v); 

		// Check if the selected edge is creating 
		// a cycle or not (Cycle is created if u 
		// and v belong to same set) 
		if (set_u != set_v) 
		{ 
			// Current edge will be in the MST 
			// so print it 
			cout << u << " - " << v << endl; 

			// Update MST weight 
			mst_wt += it->first; 

			// Merge two sets 
			ds.merge(set_u, set_v); 
		} 
	} 

	return mst_wt; 
} 

// Driver program to test above functions 
int main() 
{ 
	/* Let us create above shown weighted 
	and undirected graph */
	int V = 9, E = 14; 
	Graph g(V, E); 

	// making above shown graph 
	g.addEdge(0, 1, 4); 
	g.addEdge(0, 7, 8); 
	g.addEdge(1, 2, 8); 
	g.addEdge(1, 7, 11); 
	g.addEdge(2, 3, 7); 
	g.addEdge(2, 8, 2); 
	g.addEdge(2, 5, 4); 
	g.addEdge(3, 4, 9); 
	g.addEdge(3, 5, 14); 
	g.addEdge(4, 5, 10); 
	g.addEdge(5, 6, 2); 
	g.addEdge(6, 7, 1); 
	g.addEdge(6, 8, 6); 
	g.addEdge(7, 8, 7); 

	cout << "Edges of MST are \n"; 
	int mst_wt = g.kruskalMST(); 

	cout << "\nWeight of MST is " << mst_wt; 

	return 0; 
}

Output:

Edges of MST are 
6 - 7
2 - 8
5 - 6
0 - 1
2 - 5
2 - 3
0 - 7
3 - 4
Weight of MST is 37
...................................
Process executed in 0.11 seconds
Press any key to continue.

Explanation:

#include<bits/stdc++.h> and using namespace std;
These lines include the necessary C++ standard libraries for this program.
typedef pair<int, int> iPair;
This line defines a shorthand iPair for a pair of integers.
struct Graph
This is the definition of a structure that represents a graph.
It has three member variables: V (the number of vertices), E (the number of edges), and edges (a vector of pairs, where each pair represents an edge's weight and its vertices).
Graph(int V, int E):
This is the constructor for the Graph structure, which initializes the number of vertices and edges.
void addEdge(int u, int v, int w):
This is a member function of the Graph structure used to add an edge to the graph. It takes the source vertex u, target vertex v, and the weight of the edge w.
int kruskalMST():
This is a member function of the Graph structure that implements Kruskal's algorithm to find the Minimum Spanning Tree. It returns the weight of the MST.
struct DisjointSets:
This structure represents disjoint sets for use in Kruskal's algorithm. It has member variables for parent, rank, and the number of elements in the set.
DisjointSets(int n):
The constructor for the DisjointSets structure that initializes the disjoint sets and ranks.
int find(int u):
A function to find the parent of a node 'u' using path compression.
void merge(int x, int y):
A function for union by rank, merging two sets.
int Graph::kruskalMST() (method implementation):
This function implements Kruskal's algorithm to find the Minimum Spanning Tree of the graph.
main() function:
The driver program that demonstrates the Kruskal's algorithm on a specific graph.
It creates a Graph instance, adds edges to it, and then calls the kruskalMST function to find the MST.
Finally, it prints the edges of the MST and its total weight.
The code uses the C++ standard library data structures and functions to implement Kruskal's algorithm for finding the MST of a graph.

Time and Space Complexity Analysis

Kruskal's algorithm is a greedy algorithm that finds the MST by iteratively selecting edges with the minimum weight while avoiding cycles. To analyze the time and space complexity of this code, we will break it down step by step.

Time Complexity:

Initializing the graph and adding edges takes O(E) time, where E is the number of edges.
Sorting the edges using the sort function takes O(E * log(E)) time. Sorting is the most time-consuming operation in the code.
Creating the Disjoint Sets structure takes O(V) time, where V is the number of vertices.
The main loop iterates through all sorted edges and performs operations that depend on the number of edges, E.

Within the loop

Finding the parent of a node using path compression (Disjoint Sets) takes nearly constant amortized time, O(log*(V)), where log* is the iterated logarithm. In practice, it's almost constant, which is very close to O(1).
Merging two sets by rank also takes nearly constant time, O(1).
Hence, the overall time complexity of the code is dominated by the sorting step, which is O(E * log(E)). In practice, for a dense graph, this is very close to O(E * log(V)), which is the more commonly seen expression for Kruskal's algorithm's time complexity.

Space Complexity:

The space complexity is primarily determined by the data structures used in the code.
The edges vector stores all the edges with their weights, which takes O(E) space.
The Disjoint Sets structure requires additional space for parent and rank arrays, which takes O(V) space.
The additional space used for variables, iterators, and other constants is relatively small and can be considered constant.
Hence, the space complexity of the code is O(E + V), where E is the number of edges and V is the number of vertices. In most cases, E is much smaller than V, so the space complexity can be approximated as O(V).
In summary, the time complexity of this code is dominated by the sorting step, which is O(E * log(E)), and the space complexity is O(E + V). Kruskal's algorithm is efficient for finding the MST in sparse graphs and performs well in practice.

Applications of Greedy Algorithms

Greedy algorithms are used in a variety of fields, including economics, engineering, and computer science. They are used in the following instances:

1. The Shortest Path Issues

Greedy techniques, such as Dijkstra's algorithm, effectively identify the shortest path between two nodes in graph theory and network routing. The program chooses the closest unexplored node repeatedly till it reaches the target.

2. Huffman Coding

Data compression uses Huffman coding to encrypt characters using variable-length codes. The least common characters are merged at each stage to create the Huffman tree using greedy algorithms.

3. Fractional Knapsack Problem

In this well-known optimization problem, the goal is to maximize the overall value within a finite weight capacity by choosing a group of objects with weights and values. By selecting the things with the highest value-to-weight ratio, greedy algorithms can offer a rough answer.

Advantages of Greedy Algorithms

1. Simplicity

One of the primary advantages of greedy algorithms is their simplicity. They are easy to understand, implement, and analyze, making them a preferred choice for solving problems in a time-efficient manner.

2. Efficiency

Greedy algorithms often have excellent time and space complexity, making them suitable for real-time applications and scenarios with large datasets.

Limitations of Greedy Algorithms

While greedy algorithms are powerful, they are not suitable for all problems. They have some inherent limitations:

1. Lack of Global Optimality

Greedy algorithms make decisions based on local optimality without considering the long-term consequences. Consequently, they may not always produce globally optimal solutions.

2. No Backtracking

Once a choice is made in a greedy algorithm, it cannot be undone. If an early decision leads to a suboptimal solution later on, there is no mechanism to backtrack and correct it.

Greedy algorithms solve optimization issues quickly and effectively by selecting options that are locally optimum at each phase. They are vital tools in many areas of computer science and beyond, despite the fact that they have inherent limitations and might not always guarantee globally optimum solutions. Knowing when and how to use greedy algorithms is a talent that may result in beautiful and practical answers in a variety of real-world situations.

Next TopicMaximum product subarray in C++

← prev next →