Overview
Disjoint Set Union (DSU), also known as Union-Find, maintains a collection of non-overlapping sets and provides near-constant time operations to merge sets and determine whether two elements belong to the same set. The data structure represents each set as a tree, where elements point to their parent, and the root serves as the set representative.
The primary operations include find, which locates the representative element of a set, and union, which combines two sets into one. DSU excels in scenarios requiring dynamic connectivity queries, where the relationship between elements changes over time through union operations, and the algorithm must efficiently answer whether elements share connectivity.
Applications span graph algorithms, network connectivity analysis, image segmentation, and equivalence class management. Kruskal's minimum spanning tree algorithm depends on DSU to detect cycles when adding edges. Social network analysis uses DSU to identify connected communities. The data structure handles problems where elements start as individual sets and progressively merge based on relationships.
# Basic connectivity example
dsu = DisjointSetUnion.new(6)
# Initially, each element is in its own set
# {0}, {1}, {2}, {3}, {4}, {5}
dsu.union(0, 1) # Merge sets containing 0 and 1
dsu.union(2, 3) # Merge sets containing 2 and 3
dsu.union(1, 2) # Merge sets containing 1 and 2
# Now we have: {0, 1, 2, 3}, {4}, {5}
dsu.connected?(0, 3) # => true (same set)
dsu.connected?(0, 4) # => false (different sets)
The naive implementation provides O(n) time complexity per operation in the worst case. Path compression and union by rank optimizations reduce the amortized time complexity to O(α(n)), where α represents the inverse Ackermann function, which grows extraordinarily slowly and remains less than 5 for all practical input sizes.
Key Principles
The data structure represents each set as a tree structure where nodes point toward the root. Each element maintains a parent reference, forming a path from any element to its set representative. The root element points to itself, serving as the canonical representative for all elements in that set.
Parent Array Representation
DSU stores the parent relationship in an array where parent[i] indicates the parent of element i. When parent[i] == i, element i serves as the root of its set. This compact representation requires only O(n) space for n elements.
# Initial state: each element is its own parent (root)
parent = [0, 1, 2, 3, 4] # parent[i] = i for all i
# After union(1, 2): element 2 points to element 1
parent = [0, 1, 1, 3, 4] # parent[2] = 1
# After union(3, 4): element 4 points to element 3
parent = [0, 1, 1, 3, 3] # parent[4] = 3
Find Operation
The find operation traverses parent pointers from an element to the root, returning the root as the set representative. This operation determines which set contains a given element. Two elements belong to the same set if and only if their find operations return the same root.
def find(x)
while parent[x] != x
x = parent[x]
end
x
end
Without optimization, find operations follow the entire path to the root, resulting in O(h) complexity where h represents tree height. In degenerate cases where unions create linear chains, height reaches O(n), making each find operation linear.
Union Operation
Union merges two sets by making one root point to another. The operation first finds the roots of both elements, then if they differ, assigns one root as the parent of the other. This single pointer update combines entire sets.
def union(x, y)
root_x = find(x)
root_y = find(y)
return if root_x == root_y # Already in same set
parent[root_x] = root_y # Merge by making one root point to the other
end
The naive union operation attaches trees arbitrarily, potentially creating tall, unbalanced structures. When a tall tree becomes a child of another tree, subsequent find operations on elements in the taller tree must traverse additional levels.
Path Compression
Path compression optimizes find operations by flattening tree structures. During a find operation, after locating the root, the algorithm updates all nodes along the path to point directly to the root. This optimization ensures future find operations on any element along that path require only one pointer dereference.
def find(x)
return x if parent[x] == x
# Recursive implementation with path compression
parent[x] = find(parent[x])
parent[x]
end
Path compression reduces amortized time complexity dramatically. Each find operation flattens its path, benefiting all subsequent operations on those elements. The optimization works particularly well with multiple queries, as repeated finds on related elements progressively flatten the entire structure.
Union by Rank
Union by rank maintains a rank value for each root, representing an upper bound on tree height. When merging sets, the algorithm attaches the tree with lower rank to the tree with higher rank, preventing unnecessary height increases. If ranks equal each other, either root works as the new parent, and the resulting rank increments.
The rank does not represent exact height after path compression, but serves as a heuristic for tree size. Without path compression, rank equals actual height. With path compression, rank provides an upper bound that remains useful for union decisions.
def union(x, y)
root_x = find(x)
root_y = find(y)
return if root_x == root_y
# Attach smaller rank tree to larger rank tree
if rank[root_x] < rank[root_y]
parent[root_x] = root_y
elsif rank[root_x] > rank[root_y]
parent[root_y] = root_x
else
parent[root_y] = root_x
rank[root_x] += 1
end
end
Set Size Tracking
Many applications require knowing the size of each set. The data structure can maintain a size array where size[root] stores the number of elements in that set. Union operations update sizes by adding the smaller set's size to the larger set's root.
Connected Component Counting
DSU tracks the number of disjoint sets through a counter that decrements during each successful union. This count represents the number of connected components in graph applications or the number of distinct equivalence classes in grouping problems.
Ruby Implementation
Ruby's dynamic typing and object-oriented features enable clean DSU implementations. The following implementation includes path compression, union by rank, and size tracking in a single class.
class DisjointSetUnion
attr_reader :size_count
def initialize(n)
@parent = Array.new(n) { |i| i }
@rank = Array.new(n, 0)
@size = Array.new(n, 1)
@size_count = n # Number of disjoint sets
end
def find(x)
return x if @parent[x] == x
# Path compression: point directly to root
@parent[x] = find(@parent[x])
end
def union(x, y)
root_x = find(x)
root_y = find(y)
return false if root_x == root_y # Already connected
# Union by rank
if @rank[root_x] < @rank[root_y]
@parent[root_x] = root_y
@size[root_y] += @size[root_x]
elsif @rank[root_x] > @rank[root_y]
@parent[root_y] = root_x
@size[root_x] += @size[root_y]
else
@parent[root_y] = root_x
@size[root_x] += @size[root_y]
@rank[root_x] += 1
end
@size_count -= 1
true
end
def connected?(x, y)
find(x) == find(y)
end
def set_size(x)
@size[find(x)]
end
end
This implementation demonstrates idiomatic Ruby patterns including attribute readers for accessing internal state, array initialization with blocks, and meaningful method names following Ruby conventions. The connected? predicate method uses the Ruby question mark convention for boolean returns.
Iterative Path Compression
Recursive find operations can cause stack overflow for very deep trees before path compression takes effect. An iterative implementation with two passes avoids recursion depth limits.
def find(x)
# First pass: find root
root = x
root = @parent[root] while @parent[root] != root
# Second pass: path compression
while x != root
next_node = @parent[x]
@parent[x] = root
x = next_node
end
root
end
The iterative approach traverses the path twice: once to locate the root, then again to update parent pointers. This guarantees O(1) stack space regardless of tree structure.
Union by Size Alternative
Instead of tracking rank, implementations can use actual set sizes for union decisions, always attaching the smaller set to the larger set. This approach provides similar performance characteristics while maintaining exact size information.
def union(x, y)
root_x = find(x)
root_y = find(y)
return false if root_x == root_y
# Always attach smaller set to larger set
if @size[root_x] < @size[root_y]
@parent[root_x] = root_y
@size[root_y] += @size[root_x]
else
@parent[root_y] = root_x
@size[root_x] += @size[root_y]
end
@size_count -= 1
true
end
Union by size guarantees logarithmic tree height without path compression. With path compression, the choice between rank and size becomes primarily a matter of implementation preference.
Enumeration Methods
Ruby's enumerable patterns enable convenient set queries and manipulations.
def each_set
return enum_for(__method__) unless block_given?
sets = Hash.new { |h, k| h[k] = [] }
@parent.each_index do |i|
sets[find(i)] << i
end
sets.each_value { |members| yield members }
end
def all_sets
each_set.to_a
end
def representatives
@parent.each_index.select { |i| @parent[i] == i }
end
These methods provide Ruby-style iteration over sets, returning enumerators when called without blocks. The all_sets method collects all disjoint sets, while representatives returns the root elements.
Performance Considerations
The combination of path compression and union by rank achieves O(α(n)) amortized time complexity per operation, where α denotes the inverse Ackermann function. This function grows so slowly that α(n) ≤ 4 for any practical value of n (up to the number of atoms in the universe).
Amortized Analysis
Individual operations may require more than constant time, particularly initial find operations on deep trees. However, path compression ensures each element's path gets compressed, amortizing the cost across subsequent operations. The amortized analysis proves that a sequence of m operations on n elements requires O(m · α(n)) total time.
Space Complexity
DSU requires O(n) space for the parent array, plus O(n) for rank or size arrays if tracking those values. The space overhead remains linear and constant per element regardless of the number of union operations performed.
Operation Timing
require 'benchmark'
def benchmark_dsu(n, operations)
dsu = DisjointSetUnion.new(n)
Benchmark.bm(20) do |x|
x.report("random unions:") do
operations.times do
a, b = rand(n), rand(n)
dsu.union(a, b)
end
end
x.report("connectivity queries:") do
operations.times do
a, b = rand(n), rand(n)
dsu.connected?(a, b)
end
end
end
end
# Demonstrates near-constant time even with large inputs
benchmark_dsu(100_000, 100_000)
Performance remains consistent across different input patterns. Random unions, worst-case sequential unions, and mixed operation sequences all exhibit similar amortized performance characteristics due to the optimization techniques.
Comparison with Alternative Approaches
Naive implementations without optimizations exhibit O(n) time complexity per operation in worst cases. Storing sets as explicit collections (arrays or hash sets) requires O(n) time for union operations due to copying elements. Balanced tree approaches for set management require O(log n) per operation, slower than DSU's near-constant time.
For sparse connectivity problems where queries vastly outnumber unions, DSU dramatically outperforms alternatives. The data structure makes the union operation slightly more expensive (still near-constant) to make find operations extremely fast, matching the typical query pattern.
Cache Performance
The array-based representation exhibits excellent cache locality. Parent pointers reside in contiguous memory, and path compression reduces pointer chasing. Modern CPU caches perform well with DSU operations compared to pointer-heavy data structures like balanced trees.
Scalability Limits
For extremely large datasets exceeding memory capacity, DSU requires modifications. The parent array must fit in RAM for efficient access. Distributed implementations become necessary when element counts exceed single-machine memory, though distributed DSU algorithms lose the near-constant time guarantee due to network latency.
Practical Examples
Network Connectivity
A network monitoring system tracks connections between servers, determining whether two servers can communicate through intermediate connections.
class NetworkConnectivity
def initialize(server_count)
@dsu = DisjointSetUnion.new(server_count)
@servers = Array.new(server_count) { |i| "server-#{i}" }
end
def add_connection(server1, server2)
@dsu.union(server1, server2)
end
def can_communicate?(server1, server2)
@dsu.connected?(server1, server2)
end
def network_clusters
@dsu.each_set.map do |cluster|
cluster.map { |id| @servers[id] }
end
end
def largest_cluster_size
@dsu.representatives.map { |root| @dsu.set_size(root) }.max
end
end
# Usage
network = NetworkConnectivity.new(10)
# Add network connections
network.add_connection(0, 1)
network.add_connection(1, 2)
network.add_connection(3, 4)
network.add_connection(5, 6)
network.add_connection(6, 7)
network.add_connection(2, 5)
puts network.can_communicate?(0, 7) # => true (connected through 0-1-2-5-6-7)
puts network.can_communicate?(0, 3) # => false (in different clusters)
puts network.network_clusters.size # => 3 clusters
This example demonstrates how DSU handles dynamic network topology changes. As connections form, the data structure merges clusters and efficiently answers reachability queries.
Kruskal's Minimum Spanning Tree
Kruskal's algorithm finds the minimum spanning tree of a weighted graph by processing edges in increasing weight order and adding edges that connect different components.
class KruskalMST
Edge = Struct.new(:u, :v, :weight)
def initialize(vertices)
@vertices = vertices
@edges = []
end
def add_edge(u, v, weight)
@edges << Edge.new(u, v, weight)
end
def find_mst
# Sort edges by weight
sorted_edges = @edges.sort_by(&:weight)
dsu = DisjointSetUnion.new(@vertices)
mst_edges = []
total_weight = 0
sorted_edges.each do |edge|
# Add edge if it connects different components
if dsu.union(edge.u, edge.v)
mst_edges << edge
total_weight += edge.weight
# MST complete when we have V-1 edges
break if mst_edges.size == @vertices - 1
end
end
{ edges: mst_edges, weight: total_weight }
end
end
# Build graph
graph = KruskalMST.new(6)
graph.add_edge(0, 1, 4)
graph.add_edge(0, 2, 4)
graph.add_edge(1, 2, 2)
graph.add_edge(1, 3, 5)
graph.add_edge(2, 3, 8)
graph.add_edge(2, 4, 10)
graph.add_edge(3, 4, 2)
graph.add_edge(3, 5, 6)
graph.add_edge(4, 5, 3)
result = graph.find_mst
# => MST edges: (1,2,2), (3,4,2), (4,5,3), (0,1,4), (3,5,6)
# => Total weight: 17
DSU detects cycles efficiently in this algorithm. Each union operation succeeds only when adding an edge between different components, guaranteeing the result forms a tree without cycles.
Image Segmentation
Image processing algorithms use DSU to identify connected regions of similar pixels, grouping them into segments.
class ImageSegmentation
def initialize(height, width)
@height = height
@width = width
@dsu = DisjointSetUnion.new(height * width)
end
def process_image(pixels, threshold)
# Connect adjacent pixels with similar values
@height.times do |row|
@width.times do |col|
current_idx = row * @width + col
current_val = pixels[row][col]
# Check right neighbor
if col + 1 < @width
neighbor_idx = row * @width + (col + 1)
neighbor_val = pixels[row][col + 1]
if (current_val - neighbor_val).abs <= threshold
@dsu.union(current_idx, neighbor_idx)
end
end
# Check bottom neighbor
if row + 1 < @height
neighbor_idx = (row + 1) * @width + col
neighbor_val = pixels[row + 1][col]
if (current_val - neighbor_val).abs <= threshold
@dsu.union(current_idx, neighbor_idx)
end
end
end
end
end
def segment_count
@dsu.size_count
end
def get_segments
segments = Hash.new { |h, k| h[k] = [] }
@height.times do |row|
@width.times do |col|
idx = row * @width + col
root = @dsu.find(idx)
segments[root] << [row, col]
end
end
segments.values
end
end
# Process 8x8 grayscale image
pixels = [
[10, 12, 11, 10, 50, 52, 51, 50],
[11, 10, 12, 11, 51, 50, 52, 51],
[12, 11, 10, 12, 52, 51, 50, 52],
[10, 12, 11, 10, 50, 52, 51, 50],
[90, 91, 92, 90, 10, 11, 12, 10],
[91, 90, 91, 92, 11, 10, 11, 12],
[92, 91, 90, 91, 12, 11, 10, 11],
[90, 92, 91, 90, 10, 12, 11, 10]
]
segmentation = ImageSegmentation.new(8, 8)
segmentation.process_image(pixels, 5)
puts "Found #{segmentation.segment_count} segments"
# => Found 4 segments (top-left, top-right, bottom-left, bottom-right regions)
This example shows how DSU merges adjacent similar pixels into regions. The algorithm makes a single pass through the image, connecting pixels that meet the similarity threshold.
Equivalence Class Detection
Systems that process equivalence relationships benefit from DSU when identifying groups of equivalent items based on transitive relationships.
class EquivalenceClassifier
def initialize(items)
@items = items
@item_to_id = items.each_with_index.to_h
@dsu = DisjointSetUnion.new(items.size)
end
def mark_equivalent(item1, item2)
id1 = @item_to_id[item1]
id2 = @item_to_id[item2]
@dsu.union(id1, id2) if id1 && id2
end
def equivalent?(item1, item2)
id1 = @item_to_id[item1]
id2 = @item_to_id[item2]
id1 && id2 && @dsu.connected?(id1, id2)
end
def equivalence_classes
classes = Hash.new { |h, k| h[k] = [] }
@item_to_id.each do |item, id|
root = @dsu.find(id)
classes[root] << item
end
classes.values
end
end
# Classify email addresses as belonging to same user
emails = [
"alice@work.com", "alice@personal.com", "alice123@work.com",
"bob@work.com", "bob_jones@personal.com",
"carol@work.com"
]
classifier = EquivalenceClassifier.new(emails)
# Establish equivalences based on known relationships
classifier.mark_equivalent("alice@work.com", "alice@personal.com")
classifier.mark_equivalent("alice@personal.com", "alice123@work.com")
classifier.mark_equivalent("bob@work.com", "bob_jones@personal.com")
# Check equivalence
puts classifier.equivalent?("alice@work.com", "alice123@work.com") # => true
puts classifier.equivalent?("alice@work.com", "bob@work.com") # => false
# Get equivalence classes
classes = classifier.equivalence_classes
# => [
# ["alice@work.com", "alice@personal.com", "alice123@work.com"],
# ["bob@work.com", "bob_jones@personal.com"],
# ["carol@work.com"]
# ]
The transitivity property holds automatically through DSU: if A equals B and B equals C, then finding the representative for A, B, and C returns the same root, establishing A equals C without explicit declaration.
Common Patterns
Lazy Propagation with DSU
Some applications require associating additional data with each set, such as aggregate values or properties. Storing this data at root nodes enables efficient queries, but union operations must merge these values.
class WeightedDSU
def initialize(n)
@parent = Array.new(n) { |i| i }
@rank = Array.new(n, 0)
@value = Array.new(n, 0) # Additional data per set
end
def find(x)
return x if @parent[x] == x
@parent[x] = find(@parent[x])
end
def union(x, y)
root_x = find(x)
root_y = find(y)
return false if root_x == root_y
# Merge values during union
total_value = @value[root_x] + @value[root_y]
if @rank[root_x] < @rank[root_y]
@parent[root_x] = root_y
@value[root_y] = total_value
elsif @rank[root_x] > @rank[root_y]
@parent[root_y] = root_x
@value[root_x] = total_value
else
@parent[root_y] = root_x
@value[root_x] = total_value
@rank[root_x] += 1
end
true
end
def add_value(x, delta)
root = find(x)
@value[root] += delta
end
def get_value(x)
@value[find(x)]
end
end
This pattern maintains aggregate values at root nodes, updating them during union operations. Path compression preserves correctness because values remain at the root.
Rollback Support
Certain algorithms require trying unions speculatively and rolling them back if they lead to invalid states. Supporting rollback requires storing union history without path compression, which would prevent reverting changes.
class RollbackDSU
def initialize(n)
@parent = Array.new(n) { |i| i }
@rank = Array.new(n, 0)
@history = []
end
def find(x)
# No path compression to support rollback
x = @parent[x] while @parent[x] != x
x
end
def union(x, y)
root_x = find(x)
root_y = find(y)
return false if root_x == root_y
# Store state before union
if @rank[root_x] < @rank[root_y]
@history << [:parent, root_x, @parent[root_x]]
@parent[root_x] = root_y
elsif @rank[root_x] > @rank[root_y]
@history << [:parent, root_y, @parent[root_y]]
@parent[root_y] = root_x
else
@history << [:parent, root_y, @parent[root_y]]
@history << [:rank, root_x, @rank[root_x]]
@parent[root_y] = root_x
@rank[root_x] += 1
end
true
end
def rollback
return false if @history.empty?
type, index, old_value = @history.pop
case type
when :parent
@parent[index] = old_value
when :rank
@rank[index] = old_value
end
true
end
def checkpoint
@history.size
end
def rollback_to(checkpoint)
while @history.size > checkpoint
rollback
end
end
end
This implementation sacrifices the amortized O(α(n)) complexity of path compression in exchange for rollback capability. Find operations run in O(log n) time, but applications requiring backtracking need this functionality.
Persistent DSU
Functional programming contexts or algorithms requiring access to historical DSU states benefit from persistent implementations that preserve previous versions.
class PersistentDSU
Version = Struct.new(:parent, :rank, :previous)
def initialize(n)
initial_parent = Array.new(n) { |i| i }
initial_rank = Array.new(n, 0)
@current = Version.new(initial_parent, initial_rank, nil)
end
def find(x, version = @current)
parent = version.parent
x = parent[x] while parent[x] != x
x
end
def union(x, y)
root_x = find(x)
root_y = find(y)
return @current if root_x == root_y
# Create new version
new_parent = @current.parent.dup
new_rank = @current.rank.dup
if new_rank[root_x] < new_rank[root_y]
new_parent[root_x] = root_y
elsif new_rank[root_x] > new_rank[root_y]
new_parent[root_y] = root_x
else
new_parent[root_y] = root_x
new_rank[root_x] += 1
end
@current = Version.new(new_parent, new_rank, @current)
end
def connected?(x, y, version = @current)
find(x, version) == find(y, version)
end
def previous_version
@current = @current.previous if @current.previous
@current
end
end
Persistent implementations create new versions on each union, allowing queries on historical states. The memory cost grows with each operation, making this approach suitable only when version history requirements justify the overhead.
Offline Query Processing
When all operations arrive before query processing begins, sorting unions and queries together enables optimizations. This pattern appears in competitive programming and batch processing systems.
def process_connectivity_queries(n, unions, queries)
# Sort operations by priority (unions before queries at same timestamp)
operations = []
unions.each do |timestamp, u, v|
operations << [timestamp, 0, u, v] # 0 = union
end
queries.each_with_index do |(timestamp, u, v), idx|
operations << [timestamp, 1, u, v, idx] # 1 = query
end
operations.sort!
dsu = DisjointSetUnion.new(n)
results = []
operations.each do |op|
if op[1] == 0 # Union
dsu.union(op[2], op[3])
else # Query
results[op[4]] = dsu.connected?(op[2], op[3])
end
end
results
end
# Process operations in optimal order
unions = [
[1, 0, 1],
[3, 2, 3],
[5, 1, 2]
]
queries = [
[2, 0, 2], # false at time 2
[4, 0, 3], # true at time 4
[6, 0, 3] # true at time 6
]
results = process_connectivity_queries(4, unions, queries)
# => [false, true, true]
This pattern processes operations in temporal order, building connectivity state progressively and answering queries based on the state at their timestamp.
Reference
Core Operations
| Operation | Description | Time Complexity |
|---|---|---|
| initialize(n) | Create DSU with n elements, each in own set | O(n) |
| find(x) | Return representative of set containing x | O(α(n)) amortized |
| union(x, y) | Merge sets containing x and y | O(α(n)) amortized |
| connected?(x, y) | Check if x and y in same set | O(α(n)) amortized |
| set_size(x) | Return size of set containing x | O(α(n)) amortized |
| size_count | Return number of disjoint sets | O(1) |
Optimization Techniques
| Technique | Impact | Trade-off |
|---|---|---|
| Path Compression | Reduces amortized time to O(α(n)) | Prevents rollback capability |
| Union by Rank | Bounds tree height logarithmically | Requires additional O(n) space |
| Union by Size | Alternative to union by rank | Maintains exact size information |
| Iterative Find | Avoids stack overflow | Requires two passes through path |
| Two-Pass Compression | Flattens entire path | Slightly higher constant factor |
Implementation Variants
| Variant | Use Case | Characteristics |
|---|---|---|
| Standard DSU | General connectivity queries | Path compression + union by rank |
| Weighted DSU | Aggregate values per set | Additional data stored at roots |
| Rollback DSU | Speculative unions | No path compression, stores history |
| Persistent DSU | Historical queries | Immutable versions, higher memory |
| Range DSU | Sequential element merging | Optimized for interval unions |
Space Complexity
| Component | Space | Notes |
|---|---|---|
| Parent array | O(n) | Required for all implementations |
| Rank array | O(n) | Used with union by rank |
| Size array | O(n) | Used with union by size |
| History stack | O(m) | For rollback support, m unions |
| Version list | O(m × n) | For persistent implementation |
Common Applications
| Application | DSU Role | Additional Considerations |
|---|---|---|
| Kruskal's MST | Cycle detection | Process edges by weight |
| Connected Components | Track connectivity | Works on dynamic graphs |
| Image Segmentation | Group similar pixels | Threshold determines merging |
| Network Analysis | Cluster identification | Handle dynamic connections |
| Equivalence Classes | Transitive relations | Automatic transitivity |
| Social Networks | Community detection | Merge based on relationships |
Performance Characteristics
| Scenario | Operations | Actual Time |
|---|---|---|
| Initialization | n elements | O(n) |
| Sequential unions | n-1 unions | O(n × α(n)) |
| Random unions | m unions | O(m × α(n)) |
| Query-heavy | m finds, k unions | O((m+k) × α(n)) |
| Complete graph | O(n²) edges | O(n² × α(n)) |
Edge Case Handling
| Case | Behavior | Implementation Note |
|---|---|---|
| Union same element | No change, return false | Check root_x == root_y |
| Find on root | Return immediately | Base case parent[x] == x |
| Empty set | Not applicable | All elements start in sets |
| Single element | Already a root | Parent points to self |
| All elements united | One set remaining | size_count becomes 1 |
| Invalid index | Undefined behavior | Validate indices externally |
Ruby-Specific Idioms
| Pattern | Example | Purpose |
|---|---|---|
| Array initialization | Array.new(n) { i } | Create parent array |
| Predicate method | connected?(x, y) | Boolean query naming |
| Attribute reader | attr_reader :size_count | Expose internal state |
| Struct definition | Edge = Struct.new(:u, :v, :w) | Lightweight data containers |
| Enumerator pattern | each_set without block | Lazy evaluation support |
| Hash default block | Hash.new { [] } | Group elements by root |
Common Pitfalls
| Mistake | Impact | Solution |
|---|---|---|
| No optimization | O(n) per operation | Implement path compression |
| Arbitrary union | Tall trees | Use union by rank or size |
| Recursive without limit | Stack overflow | Use iterative implementation |
| Forgetting size decrement | Wrong component count | Update in union method |
| Mutating during iteration | Undefined behavior | Copy before modifying |
| Assuming sorted output | Incorrect results | Sets have no defined order |