Overview
A minimum spanning tree (MST) is a subset of edges in a connected, weighted, undirected graph that connects all vertices with the minimum total edge weight and contains no cycles. The spanning tree includes all vertices from the original graph but uses exactly V-1 edges, where V represents the number of vertices. The "minimum" designation refers to the sum of edge weights being as small as possible among all possible spanning trees.
Two primary algorithms solve the MST problem: Kruskal's algorithm and Prim's algorithm. Both produce optimal solutions but differ in their approach. Kruskal's algorithm builds the MST by sorting edges and adding them in ascending weight order while avoiding cycles. Prim's algorithm grows the MST from a starting vertex, repeatedly adding the minimum-weight edge that connects a vertex in the tree to a vertex outside it.
MST algorithms apply to network design problems where connecting all nodes with minimum cost is required. These include designing telecommunications networks, planning road systems, constructing electrical grids, and creating computer network topologies. The algorithms also support clustering analysis, image segmentation, and approximation algorithms for NP-hard problems like the traveling salesman problem.
Both algorithms guarantee finding an optimal MST through greedy approaches. They make locally optimal choices at each step without backtracking, yet still achieve global optimality. This property makes MST algorithms both efficient and reliable for practical applications.
Key Principles
The fundamental principle underlying MST algorithms is the cut property of graphs. A cut divides graph vertices into two disjoint sets. The cut property states that for any cut of the graph, the minimum-weight edge crossing the cut belongs to some MST. This principle provides the theoretical foundation for both Kruskal's and Prim's algorithms, which repeatedly apply this property in different ways.
Kruskal's algorithm operates on the edge set directly. The algorithm maintains a forest of trees that gradually merges into a single tree. Initially, each vertex forms its own tree. The algorithm examines edges in order of increasing weight, adding an edge to the MST if it connects vertices in different trees. This edge selection process requires efficient detection of cycles, typically implemented using a union-find (disjoint-set) data structure. The union-find structure tracks which vertices belong to the same connected component, enabling constant-time queries about whether adding an edge would create a cycle.
The union-find data structure maintains a partition of vertices into disjoint sets. Two key operations define its interface: find determines which set contains a given element, and union merges two sets. Path compression during find operations and union by rank during merge operations provide nearly constant amortized time complexity. Without these optimizations, the structure would degrade to linear time performance.
Prim's algorithm grows a single tree from a starting vertex. The algorithm maintains two sets: vertices included in the MST and vertices not yet included. At each step, it selects the minimum-weight edge connecting a vertex in the MST to a vertex outside it. This selection process requires efficiently finding minimum-weight edges, typically implemented using a priority queue (min-heap). The priority queue stores vertices with their current minimum connection cost to the growing tree.
The greedy property ensures optimality for both algorithms. At each step, both algorithms make the choice that appears best at that moment without reconsidering previous decisions. The cut property guarantees this greedy approach produces an optimal result. When Kruskal's algorithm adds an edge, it adds the minimum-weight edge crossing some cut. When Prim's algorithm adds an edge, it adds the minimum-weight edge crossing the cut between selected and unselected vertices.
MST algorithms require connected graphs. In disconnected graphs, no spanning tree exists because some vertices cannot be reached from others. The algorithms produce a minimum spanning forest in such cases, creating a separate tree for each connected component. The total edge count equals V - C, where V represents vertices and C represents connected components.
Edge weight handling differs between variations. Standard MST algorithms assume distinct edge weights, though the algorithms work correctly with duplicate weights. When weights are equal, multiple MSTs may exist, and the algorithm's tie-breaking rules determine which one it produces. Some applications require finding all possible MSTs or counting them, which requires modified algorithms.
Implementation Approaches
Kruskal's algorithm follows an edge-centric approach. The implementation requires sorting all edges by weight, then iterating through them in order. For each edge, the algorithm checks whether its endpoints belong to different connected components. If they do, the edge joins two separate trees and gets added to the MST. The union-find structure tracks component membership and performs the merging operation.
The algorithm's structure involves three main phases: edge sorting, union-find initialization, and edge processing. Edge sorting dominates the time complexity when using comparison-based sorting, requiring O(E log E) time where E represents the number of edges. The union-find operations contribute nearly O(E α(V)) time, where α represents the inverse Ackermann function, which grows extremely slowly. For practical purposes, α(V) remains less than 5 for any reasonable graph size.
Prim's algorithm follows a vertex-centric approach. The implementation maintains a priority queue of vertices, ordered by their minimum connection cost to the current MST. Starting from an arbitrary vertex, the algorithm repeatedly extracts the minimum-cost vertex from the queue and adds its connecting edge to the MST. After adding a vertex, the algorithm updates the connection costs of its neighbors if the new vertex provides a cheaper path to the tree.
Implementation varies based on the priority queue structure. A binary heap provides O(log V) extraction and update operations, resulting in O(E log V) total time complexity. A Fibonacci heap improves this to O(E + V log V) through O(1) amortized decrease-key operations, though the constant factors make Fibonacci heaps impractical for most applications. Array-based implementations work well for dense graphs, achieving O(V²) time without the overhead of heap operations.
The choice between Kruskal's and Prim's algorithms depends on graph characteristics. Kruskal's algorithm performs better on sparse graphs where E approaches V, as the edge sorting cost remains manageable. Prim's algorithm excels on dense graphs where E approaches V², as it avoids sorting all edges and instead focuses on vertex processing. For graphs stored as adjacency lists, Prim's algorithm accesses memory more efficiently by following edges from vertices.
Parallel implementations differ significantly between algorithms. Kruskal's algorithm parallelizes poorly because the union-find structure requires synchronization to maintain consistency. Concurrent union-find implementations exist but add substantial overhead. Prim's algorithm parallelizes more naturally by maintaining multiple growing trees simultaneously, though merging them requires care to ensure optimality.
Ruby Implementation
Ruby implementations of MST algorithms use arrays, hashes, and priority queues. The standard library lacks a built-in heap structure, requiring either a custom implementation or a gem. The algorithms gem provides priority queue implementations, though manual binary heap code offers more control over the data structure.
class UnionFind
def initialize(size)
@parent = (0...size).to_a
@rank = Array.new(size, 0)
end
def find(x)
if @parent[x] != x
@parent[x] = find(@parent[x]) # Path compression
end
@parent[x]
end
def union(x, y)
root_x = find(x)
root_y = find(y)
return false if root_x == root_y
# Union by rank
if @rank[root_x] < @rank[root_y]
@parent[root_x] = root_y
elsif @rank[root_x] > @rank[root_y]
@parent[root_y] = root_x
else
@parent[root_y] = root_x
@rank[root_x] += 1
end
true
end
def connected?(x, y)
find(x) == find(y)
end
end
This union-find implementation includes path compression and union by rank optimizations. Path compression flattens the tree structure during find operations, making subsequent finds faster. Union by rank attaches the shorter tree under the taller tree's root, preventing degeneration into linear chains.
def kruskal_mst(vertices, edges)
# edges = [[u, v, weight], ...]
sorted_edges = edges.sort_by { |u, v, w| w }
uf = UnionFind.new(vertices)
mst = []
total_weight = 0
sorted_edges.each do |u, v, weight|
if uf.union(u, v)
mst << [u, v, weight]
total_weight += weight
break if mst.size == vertices - 1
end
end
[mst, total_weight]
end
# Usage
vertices = 4
edges = [
[0, 1, 10],
[0, 2, 6],
[0, 3, 5],
[1, 3, 15],
[2, 3, 4]
]
mst, weight = kruskal_mst(vertices, edges)
# => [[[2, 3, 4], [0, 3, 5], [0, 1, 10]], 19]
Kruskal's algorithm implementation sorts edges once, then processes them sequentially. The early termination when mst.size == vertices - 1 prevents unnecessary iterations after finding all MST edges. The union operation returns false when both vertices already belong to the same component, indicating the edge would create a cycle.
class PriorityQueue
def initialize
@heap = []
end
def insert(priority, item)
@heap << [priority, item]
bubble_up(@heap.size - 1)
end
def extract_min
return nil if @heap.empty?
swap(0, @heap.size - 1)
min = @heap.pop
bubble_down(0)
min[1]
end
def decrease_key(item, new_priority)
index = @heap.index { |p, i| i == item }
return unless index
@heap[index][0] = new_priority
bubble_up(index)
end
def empty?
@heap.empty?
end
private
def bubble_up(index)
return if index == 0
parent = (index - 1) / 2
if @heap[index][0] < @heap[parent][0]
swap(index, parent)
bubble_up(parent)
end
end
def bubble_down(index)
left = 2 * index + 1
right = 2 * index + 2
smallest = index
smallest = left if left < @heap.size && @heap[left][0] < @heap[smallest][0]
smallest = right if right < @heap.size && @heap[right][0] < @heap[smallest][0]
if smallest != index
swap(index, smallest)
bubble_down(smallest)
end
end
def swap(i, j)
@heap[i], @heap[j] = @heap[j], @heap[i]
end
end
This priority queue uses a binary heap stored in an array. The bubble_up and bubble_down methods maintain the heap property after insertions and extractions. The decrease_key operation searches for the item linearly, which is inefficient but simpler than maintaining a position map.
def prim_mst(vertices, adjacency_list)
# adjacency_list = {vertex => [[neighbor, weight], ...]}
visited = Array.new(vertices, false)
mst = []
total_weight = 0
pq = PriorityQueue.new
# Start from vertex 0
visited[0] = true
adjacency_list[0].each { |neighbor, weight| pq.insert(weight, [0, neighbor]) }
while !pq.empty?
edge = pq.extract_min
u, v = edge
next if visited[v]
visited[v] = true
mst << [u, v]
total_weight += adjacency_list[u].find { |n, w| n == v }[1]
adjacency_list[v].each do |neighbor, weight|
pq.insert(weight, [v, neighbor]) unless visited[neighbor]
end
end
[mst, total_weight]
end
# Usage
adjacency_list = {
0 => [[1, 10], [2, 6], [3, 5]],
1 => [[0, 10], [3, 15]],
2 => [[0, 6], [3, 4]],
3 => [[0, 5], [1, 15], [2, 4]]
}
mst, weight = prim_mst(4, adjacency_list)
# => [[[0, 3], [3, 2], [0, 1]], 19]
Prim's algorithm implementation maintains a visited array to track vertices in the MST. The priority queue stores edges with their weights as priorities. When extracting an edge, the algorithm skips it if the destination vertex was already visited. After adding a vertex, all its unvisited neighbors get added to the queue.
def prim_mst_optimized(vertices, adjacency_list)
visited = Array.new(vertices, false)
min_cost = Array.new(vertices, Float::INFINITY)
parent = Array.new(vertices, -1)
min_cost[0] = 0
total_weight = 0
vertices.times do
# Find minimum cost unvisited vertex
u = -1
(0...vertices).each do |v|
if !visited[v] && (u == -1 || min_cost[v] < min_cost[u])
u = v
end
end
visited[u] = true
total_weight += min_cost[u]
# Update costs for neighbors
adjacency_list[u].each do |v, weight|
if !visited[v] && weight < min_cost[v]
min_cost[v] = weight
parent[v] = u
end
end
end
# Reconstruct MST edges
mst = []
(1...vertices).each do |v|
mst << [parent[v], v] if parent[v] != -1
end
[mst, total_weight]
end
This array-based Prim's implementation avoids the priority queue overhead. The min_cost array tracks the minimum weight edge connecting each vertex to the current MST. The linear search for the minimum cost vertex makes this approach O(V²), which works well for dense graphs where this complexity matches or beats the heap-based version.
Performance Considerations
Kruskal's algorithm time complexity depends primarily on edge sorting. Using comparison-based sorting algorithms like mergesort or heapsort requires O(E log E) time. Since E cannot exceed V² in a simple graph, this simplifies to O(E log V). The union-find operations contribute O(E α(V)) time, where α represents the inverse Ackermann function. This function grows so slowly that α(V) remains less than 5 for any practical graph size, making these operations nearly constant time.
The space complexity of Kruskal's algorithm is O(V) for the union-find structure plus O(E) if edges need to be copied for sorting. In-place sorting reduces the edge storage requirement. The MST itself requires O(V) space to store V-1 edges. For sparse graphs where E is close to V, the total space complexity remains O(V).
Prim's algorithm time complexity varies by implementation. Using a binary heap for the priority queue results in O((V + E) log V) time. Each vertex gets extracted from the heap once (V extractions at O(log V) each), and each edge causes at most one priority update (E updates at O(log V) each). For dense graphs where E approaches V², this becomes O(V² log V).
The array-based Prim's implementation achieves O(V²) time complexity by performing a linear search for the minimum cost vertex at each iteration. Dense graphs make this approach competitive because the number of edges approaches V², matching the V² iterations. The reduced constant factors from avoiding heap maintenance often make array-based Prim faster in practice for dense graphs.
require 'benchmark'
def generate_graph(vertices, density)
edges = []
(0...vertices).each do |u|
((u+1)...vertices).each do |v|
edges << [u, v, rand(1..100)] if rand < density
end
end
edges
end
# Sparse graph comparison
vertices = 1000
sparse_edges = generate_graph(vertices, 0.01)
Benchmark.bm(20) do |x|
x.report("Kruskal (sparse):") { kruskal_mst(vertices, sparse_edges) }
x.report("Prim (sparse):") {
adj_list = Hash.new { |h, k| h[k] = [] }
sparse_edges.each { |u, v, w| adj_list[u] << [v, w]; adj_list[v] << [u, w] }
prim_mst(vertices, adj_list)
}
end
# Dense graph comparison
dense_edges = generate_graph(vertices, 0.5)
Benchmark.bm(20) do |x|
x.report("Kruskal (dense):") { kruskal_mst(vertices, dense_edges) }
x.report("Prim (dense):") {
adj_list = Hash.new { |h, k| h[k] = [] }
dense_edges.each { |u, v, w| adj_list[u] << [v, w]; adj_list[v] << [u, w] }
prim_mst_optimized(vertices, adj_list)
}
end
Benchmark results vary by graph structure. Kruskal's algorithm performs consistently across different densities because edge sorting dominates the runtime regardless of graph structure. Prim's algorithm shows greater variation, with the heap-based version excelling on sparse graphs and the array-based version performing better on dense graphs.
Memory access patterns affect performance significantly. Kruskal's algorithm accesses edges in sorted order, which provides poor cache locality if edges are scattered in memory. Prim's algorithm follows graph structure, accessing neighbors of each vertex sequentially. For graphs stored as adjacency lists, this sequential access pattern improves cache performance.
Edge representation impacts memory usage. Storing edges as arrays of three elements requires 24 bytes per edge on 64-bit Ruby (three 8-byte pointers). For graphs with millions of edges, this memory consumption becomes significant. Packed structures or integer arrays reduce memory usage but sacrifice Ruby's readability and flexibility.
The choice between algorithms depends on graph properties and constraints. Sparse graphs with E close to V favor Kruskal's algorithm, as the sorting cost remains low. Dense graphs with E approaching V² favor array-based Prim's algorithm, which avoids sorting overhead and maintains better cache locality. Real-world graphs often exhibit small-world properties with E = O(V), making Kruskal's algorithm a reliable default choice.
Practical Examples
Network design represents the canonical MST application. A telecommunications company needs to connect offices in multiple cities with fiber optic cables. Each potential cable connection has an associated cost based on distance and terrain. The MST provides the minimum-cost network connecting all offices.
# City network design
cities = {
0 => "New York",
1 => "Boston",
2 => "Philadelphia",
3 => "Washington DC",
4 => "Pittsburgh"
}
# Connections with costs in thousands of dollars
connections = [
[0, 1, 215], # NY-Boston
[0, 2, 95], # NY-Philadelphia
[0, 3, 225], # NY-DC
[0, 4, 360], # NY-Pittsburgh
[1, 2, 270], # Boston-Philadelphia
[2, 3, 140], # Philadelphia-DC
[2, 4, 305], # Philadelphia-Pittsburgh
[3, 4, 245] # DC-Pittsburgh
]
mst, cost = kruskal_mst(cities.size, connections)
puts "Minimum cost network: $#{cost}k"
mst.each do |u, v, weight|
puts "Connect #{cities[u]} to #{cities[v]}: $#{weight}k"
end
# Output:
# Minimum cost network: $615k
# Connect New York to Philadelphia: $95k
# Connect Philadelphia to Washington DC: $140k
# Connect New York to Boston: $215k
# Connect Philadelphia to Pittsburgh: $305k
The MST reduces total network cost by eliminating redundant connections while maintaining connectivity. The solution uses only four cables to connect five cities, saving hundreds of thousands compared to connecting every city pair.
Clustering analysis uses MST algorithms to identify natural groupings in data. The approach builds an MST on data points using distance as edge weights, then removes the longest edges to create separate clusters. This single-linkage clustering method assumes clusters connect through their closest points.
def euclidean_distance(p1, p2)
Math.sqrt((p1[0] - p2[0])**2 + (p1[1] - p2[1])**2)
end
def mst_clustering(points, num_clusters)
# Build complete graph with distances
edges = []
points.size.times do |i|
((i+1)...points.size).each do |j|
dist = euclidean_distance(points[i], points[j])
edges << [i, j, dist]
end
end
# Build MST
mst, _ = kruskal_mst(points.size, edges)
# Sort MST edges by weight and remove longest ones
mst_sorted = mst.sort_by { |u, v, w| -w }
edges_to_remove = num_clusters - 1
# Use union-find to identify clusters
uf = UnionFind.new(points.size)
mst_sorted[edges_to_remove..-1].each do |u, v, _|
uf.union(u, v)
end
# Group points by cluster
clusters = Hash.new { |h, k| h[k] = [] }
points.size.times do |i|
clusters[uf.find(i)] << i
end
clusters.values
end
# Data points (x, y coordinates)
points = [
[1, 2], [2, 1], [1.5, 1.5], # Cluster 1
[8, 8], [9, 9], [8.5, 8.5], # Cluster 2
[15, 2], [16, 3], [15.5, 2.5] # Cluster 3
]
clusters = mst_clustering(points, 3)
clusters.each_with_index do |cluster, i|
puts "Cluster #{i + 1}: #{cluster.map { |idx| points[idx] }}"
end
This clustering approach identifies three distinct groups in the data. By removing the two longest MST edges, the algorithm partitions the data into three connected components. Each component represents a cluster of nearby points.
Image segmentation applies MST algorithms to divide images into regions. Pixels become graph vertices, and edge weights represent the dissimilarity between adjacent pixels based on color or intensity. The MST groups similar pixels together, and removing high-weight edges creates segments.
class ImageSegmentation
def initialize(image_data, width, height)
@data = image_data # Flat array of pixel values
@width = width
@height = height
end
def segment(num_segments)
# Build graph of adjacent pixels
edges = []
@height.times do |y|
@width.times do |x|
idx = y * @width + x
# Right neighbor
if x < @width - 1
right_idx = idx + 1
weight = (@data[idx] - @data[right_idx]).abs
edges << [idx, right_idx, weight]
end
# Bottom neighbor
if y < @height - 1
bottom_idx = idx + @width
weight = (@data[idx] - @data[bottom_idx]).abs
edges << [idx, bottom_idx, weight]
end
end
end
# Build MST and remove largest edges
vertices = @width * @height
mst, _ = kruskal_mst(vertices, edges)
mst_sorted = mst.sort_by { |u, v, w| -w }
# Create segments
uf = UnionFind.new(vertices)
mst_sorted[(num_segments - 1)..-1].each do |u, v, _|
uf.union(u, v)
end
# Map each pixel to its segment
segments = Hash.new { |h, k| h[k] = [] }
vertices.times { |i| segments[uf.find(i)] << i }
segments.values
end
end
# Example with simple grayscale image
image = [
10, 12, 11, 50, 52, 51,
11, 10, 12, 51, 50, 52,
12, 11, 10, 52, 51, 50
]
segmenter = ImageSegmentation.new(image, 6, 3)
segments = segmenter.segment(2)
puts "Found #{segments.size} segments"
segments.each_with_index do |segment, i|
puts "Segment #{i + 1}: #{segment.size} pixels"
end
The segmentation groups pixels with similar intensities. The example image has two distinct regions with different gray levels, and the MST-based approach successfully separates them by cutting the high-weight edges between dissimilar pixels.
Common Pitfalls
Forgetting to handle disconnected graphs causes runtime errors or incorrect results. MST algorithms assume graph connectivity, and disconnected graphs lack spanning trees. The algorithms produce a minimum spanning forest instead, creating separate trees for each component. Detecting disconnection requires checking whether the MST contains V-1 edges.
def kruskal_with_validation(vertices, edges)
mst, total_weight = kruskal_mst(vertices, edges)
if mst.size < vertices - 1
components = vertices - mst.size
raise "Graph is disconnected: #{components} components"
end
[mst, total_weight]
end
The validation checks edge count after MST construction. A connected graph's MST contains exactly V-1 edges. Fewer edges indicate disconnected components. Some applications require spanning forests rather than failing on disconnected graphs.
Incorrect union-find implementation breaks Kruskal's algorithm. Path compression and union by rank optimizations are optional but provide the nearly constant amortized time complexity that makes the algorithm efficient. Without these optimizations, union-find operations degrade to O(V) time, making the overall algorithm O(EV).
# Inefficient union-find without optimizations
class NaiveUnionFind
def initialize(size)
@parent = (0...size).to_a
end
def find(x)
# No path compression - follows chain to root
x = @parent[x] while @parent[x] != x
x
end
def union(x, y)
# No union by rank - arbitrary attachment
root_x = find(x)
root_y = find(y)
return false if root_x == root_y
@parent[root_x] = root_y
true
end
end
This naive implementation works correctly but performs poorly on large graphs. Repeated unions can create long chains where find operations require traversing many links. The optimized version flattens these chains and balances tree heights.
Mishandling edge weights leads to incorrect MST construction. Negative weights work correctly in MST algorithms because the greedy approach still selects minimum-weight edges. However, zero weights require attention because multiple MSTs may exist, and the algorithm's tie-breaking determines which one gets selected.
# Graph with zero-weight edges
edges = [
[0, 1, 0],
[1, 2, 0],
[2, 3, 1],
[0, 3, 2]
]
mst1, _ = kruskal_mst(4, edges)
# Result depends on edge ordering in input
# Multiple valid MSTs exist
The MST is not unique when edges have equal weights. Different edge orderings in the input can produce different but equally valid MSTs with the same total weight. Applications requiring deterministic output should sort edges by weight and then by vertex indices.
Confusing directed and undirected graphs causes errors. MST algorithms operate on undirected graphs where edges work in both directions. Applying MST algorithms to directed graphs produces incorrect results because the undirected assumption breaks. Directed graphs require different algorithms like Edmonds' algorithm for minimum spanning arborescence.
# Incorrect: treating directed graph as undirected
directed_edges = [
[0, 1, 5], # 0 -> 1
[1, 2, 3], # 1 -> 2
[2, 0, 4] # 2 -> 0 (back to start)
]
# This produces wrong results - edges treated as bidirectional
mst, _ = kruskal_mst(3, directed_edges)
MST algorithms assume edges connect vertices in both directions. Directed graphs have edges that only work in one direction, requiring specialized algorithms that respect edge directionality. The minimum spanning arborescence problem finds a directed tree rooted at a source vertex with minimum total edge weight.
Priority queue implementation errors affect Prim's algorithm correctness. Forgetting to update vertex priorities when finding shorter connections prevents the algorithm from finding the optimal MST. The decrease-key operation maintains the invariant that each vertex's priority reflects its minimum connection cost to the current tree.
# Incorrect Prim - missing priority updates
def broken_prim(vertices, adjacency_list)
visited = Array.new(vertices, false)
pq = PriorityQueue.new
pq.insert(0, 0)
while !pq.empty?
v = pq.extract_min
next if visited[v]
visited[v] = true
adjacency_list[v].each do |neighbor, weight|
# Bug: always inserting without checking if neighbor
# already has a better path
pq.insert(weight, neighbor) unless visited[neighbor]
end
end
end
This broken implementation inserts duplicate entries for vertices into the priority queue without updating existing entries. The queue fills with redundant entries, wasting memory and time. The correct approach updates a vertex's priority when a better connection is found or tracks the minimum cost in a separate array.
Reference
Algorithm Comparison
| Aspect | Kruskal's Algorithm | Prim's Algorithm |
|---|---|---|
| Approach | Edge-based, sorts all edges | Vertex-based, grows from start |
| Data Structure | Union-Find | Priority Queue or Array |
| Time Complexity | O(E log E) or O(E log V) | O(E log V) with heap, O(V²) with array |
| Space Complexity | O(V) for union-find | O(V) for priority queue |
| Best For | Sparse graphs (E ≈ V) | Dense graphs (E ≈ V²) |
| Graph Representation | Edge list | Adjacency list or matrix |
| Parallelization | Difficult | More feasible |
Time Complexity Components
| Component | Kruskal | Prim (Heap) | Prim (Array) |
|---|---|---|---|
| Initialization | O(V) | O(V) | O(V) |
| Sorting/Heap Build | O(E log E) | O(V) | O(1) |
| Main Loop | O(E α(V)) | O(E log V) | O(V²) |
| Total | O(E log E) | O(E log V) | O(V²) |
Union-Find Operations
| Operation | Without Optimization | With Path Compression | With Both Optimizations |
|---|---|---|---|
| Find | O(V) worst case | O(log V) amortized | O(α(V)) amortized |
| Union | O(V) worst case | O(log V) amortized | O(α(V)) amortized |
| Space | O(V) | O(V) | O(V) |
Implementation Decision Matrix
| Graph Property | Recommended Algorithm | Reason |
|---|---|---|
| Sparse (E ≈ V) | Kruskal | Lower sorting cost |
| Dense (E ≈ V²) | Prim with array | Avoids heap overhead |
| Edge list available | Kruskal | Direct edge processing |
| Adjacency list available | Prim | Better memory access pattern |
| Need online algorithm | Prim | Can start before all edges known |
| Parallel processing | Prim | More parallelizable |
Common Edge Cases
| Case | Handling | Impact |
|---|---|---|
| Disconnected graph | Returns spanning forest | Fewer than V-1 edges |
| Equal edge weights | Multiple valid MSTs | Result depends on tie-breaking |
| Negative weights | Works correctly | Greedy approach still optimal |
| Self-loops | Ignore during processing | Never part of MST |
| Duplicate edges | Keep only minimum weight | Others never selected |
Ruby Implementation Patterns
| Pattern | Code Example | Use Case |
|---|---|---|
| Edge list input | [[u, v, w], ...] | Kruskal's algorithm |
| Adjacency list | {v => [[n, w], ...]} | Prim's algorithm |
| Union-Find | UnionFind class | Cycle detection |
| Priority Queue | PriorityQueue class | Minimum edge selection |
| Array-based | min_cost array | Dense graphs |