CrackedRuby - Disjoint Set Union (Union-Find)

Overview

Disjoint Set Union (DSU), also known as Union-Find, maintains a collection of non-overlapping sets and provides near-constant time operations to merge sets and determine whether two elements belong to the same set. The data structure represents each set as a tree, where elements point to their parent, and the root serves as the set representative.

The primary operations include find, which locates the representative element of a set, and union, which combines two sets into one. DSU excels in scenarios requiring dynamic connectivity queries, where the relationship between elements changes over time through union operations, and the algorithm must efficiently answer whether elements share connectivity.

Applications span graph algorithms, network connectivity analysis, image segmentation, and equivalence class management. Kruskal's minimum spanning tree algorithm depends on DSU to detect cycles when adding edges. Social network analysis uses DSU to identify connected communities. The data structure handles problems where elements start as individual sets and progressively merge based on relationships.

# Basic connectivity example
dsu = DisjointSetUnion.new(6)

# Initially, each element is in its own set
# {0}, {1}, {2}, {3}, {4}, {5}

dsu.union(0, 1)  # Merge sets containing 0 and 1
dsu.union(2, 3)  # Merge sets containing 2 and 3
dsu.union(1, 2)  # Merge sets containing 1 and 2

# Now we have: {0, 1, 2, 3}, {4}, {5}

dsu.connected?(0, 3)  # => true (same set)
dsu.connected?(0, 4)  # => false (different sets)

The naive implementation provides O(n) time complexity per operation in the worst case. Path compression and union by rank optimizations reduce the amortized time complexity to O(α(n)), where α represents the inverse Ackermann function, which grows extraordinarily slowly and remains less than 5 for all practical input sizes.

Key Principles

The data structure represents each set as a tree structure where nodes point toward the root. Each element maintains a parent reference, forming a path from any element to its set representative. The root element points to itself, serving as the canonical representative for all elements in that set.

Parent Array Representation

DSU stores the parent relationship in an array where parent[i] indicates the parent of element i. When parent[i] == i, element i serves as the root of its set. This compact representation requires only O(n) space for n elements.

# Initial state: each element is its own parent (root)
parent = [0, 1, 2, 3, 4]  # parent[i] = i for all i

# After union(1, 2): element 2 points to element 1
parent = [0, 1, 1, 3, 4]  # parent[2] = 1

# After union(3, 4): element 4 points to element 3
parent = [0, 1, 1, 3, 3]  # parent[4] = 3

Find Operation

The find operation traverses parent pointers from an element to the root, returning the root as the set representative. This operation determines which set contains a given element. Two elements belong to the same set if and only if their find operations return the same root.

def find(x)
  while parent[x] != x
    x = parent[x]
  end
  x
end

Without optimization, find operations follow the entire path to the root, resulting in O(h) complexity where h represents tree height. In degenerate cases where unions create linear chains, height reaches O(n), making each find operation linear.

Union Operation

Union merges two sets by making one root point to another. The operation first finds the roots of both elements, then if they differ, assigns one root as the parent of the other. This single pointer update combines entire sets.

def union(x, y)
  root_x = find(x)
  root_y = find(y)
  return if root_x == root_y  # Already in same set
  
  parent[root_x] = root_y  # Merge by making one root point to the other
end

The naive union operation attaches trees arbitrarily, potentially creating tall, unbalanced structures. When a tall tree becomes a child of another tree, subsequent find operations on elements in the taller tree must traverse additional levels.

Path Compression

Path compression optimizes find operations by flattening tree structures. During a find operation, after locating the root, the algorithm updates all nodes along the path to point directly to the root. This optimization ensures future find operations on any element along that path require only one pointer dereference.

def find(x)
  return x if parent[x] == x
  
  # Recursive implementation with path compression
  parent[x] = find(parent[x])
  parent[x]
end

Path compression reduces amortized time complexity dramatically. Each find operation flattens its path, benefiting all subsequent operations on those elements. The optimization works particularly well with multiple queries, as repeated finds on related elements progressively flatten the entire structure.

Union by Rank

Union by rank maintains a rank value for each root, representing an upper bound on tree height. When merging sets, the algorithm attaches the tree with lower rank to the tree with higher rank, preventing unnecessary height increases. If ranks equal each other, either root works as the new parent, and the resulting rank increments.

The rank does not represent exact height after path compression, but serves as a heuristic for tree size. Without path compression, rank equals actual height. With path compression, rank provides an upper bound that remains useful for union decisions.

def union(x, y)
  root_x = find(x)
  root_y = find(y)
  return if root_x == root_y
  
  # Attach smaller rank tree to larger rank tree
  if rank[root_x] < rank[root_y]
    parent[root_x] = root_y
  elsif rank[root_x] > rank[root_y]
    parent[root_y] = root_x
  else
    parent[root_y] = root_x
    rank[root_x] += 1
  end
end

Set Size Tracking

Many applications require knowing the size of each set. The data structure can maintain a size array where size[root] stores the number of elements in that set. Union operations update sizes by adding the smaller set's size to the larger set's root.

Connected Component Counting

DSU tracks the number of disjoint sets through a counter that decrements during each successful union. This count represents the number of connected components in graph applications or the number of distinct equivalence classes in grouping problems.

Ruby Implementation

Ruby's dynamic typing and object-oriented features enable clean DSU implementations. The following implementation includes path compression, union by rank, and size tracking in a single class.

class DisjointSetUnion
  attr_reader :size_count
  
  def initialize(n)
    @parent = Array.new(n) { |i| i }
    @rank = Array.new(n, 0)
    @size = Array.new(n, 1)
    @size_count = n  # Number of disjoint sets
  end
  
  def find(x)
    return x if @parent[x] == x
    
    # Path compression: point directly to root
    @parent[x] = find(@parent[x])
  end
  
  def union(x, y)
    root_x = find(x)
    root_y = find(y)
    return false if root_x == root_y  # Already connected
    
    # Union by rank
    if @rank[root_x] < @rank[root_y]
      @parent[root_x] = root_y
      @size[root_y] += @size[root_x]
    elsif @rank[root_x] > @rank[root_y]
      @parent[root_y] = root_x
      @size[root_x] += @size[root_y]
    else
      @parent[root_y] = root_x
      @size[root_x] += @size[root_y]
      @rank[root_x] += 1
    end
    
    @size_count -= 1
    true
  end
  
  def connected?(x, y)
    find(x) == find(y)
  end
  
  def set_size(x)
    @size[find(x)]
  end
end

This implementation demonstrates idiomatic Ruby patterns including attribute readers for accessing internal state, array initialization with blocks, and meaningful method names following Ruby conventions. The connected? predicate method uses the Ruby question mark convention for boolean returns.

Iterative Path Compression

Recursive find operations can cause stack overflow for very deep trees before path compression takes effect. An iterative implementation with two passes avoids recursion depth limits.

def find(x)
  # First pass: find root
  root = x
  root = @parent[root] while @parent[root] != root
  
  # Second pass: path compression
  while x != root
    next_node = @parent[x]
    @parent[x] = root
    x = next_node
  end
  
  root
end

The iterative approach traverses the path twice: once to locate the root, then again to update parent pointers. This guarantees O(1) stack space regardless of tree structure.

Union by Size Alternative

Instead of tracking rank, implementations can use actual set sizes for union decisions, always attaching the smaller set to the larger set. This approach provides similar performance characteristics while maintaining exact size information.

def union(x, y)
  root_x = find(x)
  root_y = find(y)
  return false if root_x == root_y
  
  # Always attach smaller set to larger set
  if @size[root_x] < @size[root_y]
    @parent[root_x] = root_y
    @size[root_y] += @size[root_x]
  else
    @parent[root_y] = root_x
    @size[root_x] += @size[root_y]
  end
  
  @size_count -= 1
  true
end

Union by size guarantees logarithmic tree height without path compression. With path compression, the choice between rank and size becomes primarily a matter of implementation preference.

Enumeration Methods

Ruby's enumerable patterns enable convenient set queries and manipulations.

def each_set
  return enum_for(__method__) unless block_given?
  
  sets = Hash.new { |h, k| h[k] = [] }
  @parent.each_index do |i|
    sets[find(i)] << i
  end
  
  sets.each_value { |members| yield members }
end

def all_sets
  each_set.to_a
end

def representatives
  @parent.each_index.select { |i| @parent[i] == i }
end

These methods provide Ruby-style iteration over sets, returning enumerators when called without blocks. The all_sets method collects all disjoint sets, while representatives returns the root elements.

Performance Considerations

The combination of path compression and union by rank achieves O(α(n)) amortized time complexity per operation, where α denotes the inverse Ackermann function. This function grows so slowly that α(n) ≤ 4 for any practical value of n (up to the number of atoms in the universe).

Amortized Analysis

Individual operations may require more than constant time, particularly initial find operations on deep trees. However, path compression ensures each element's path gets compressed, amortizing the cost across subsequent operations. The amortized analysis proves that a sequence of m operations on n elements requires O(m · α(n)) total time.

Space Complexity

DSU requires O(n) space for the parent array, plus O(n) for rank or size arrays if tracking those values. The space overhead remains linear and constant per element regardless of the number of union operations performed.

Operation Timing

require 'benchmark'

def benchmark_dsu(n, operations)
  dsu = DisjointSetUnion.new(n)
  
  Benchmark.bm(20) do |x|
    x.report("random unions:") do
      operations.times do
        a, b = rand(n), rand(n)
        dsu.union(a, b)
      end
    end
    
    x.report("connectivity queries:") do
      operations.times do
        a, b = rand(n), rand(n)
        dsu.connected?(a, b)
      end
    end
  end
end

# Demonstrates near-constant time even with large inputs
benchmark_dsu(100_000, 100_000)

Performance remains consistent across different input patterns. Random unions, worst-case sequential unions, and mixed operation sequences all exhibit similar amortized performance characteristics due to the optimization techniques.

Comparison with Alternative Approaches

Naive implementations without optimizations exhibit O(n) time complexity per operation in worst cases. Storing sets as explicit collections (arrays or hash sets) requires O(n) time for union operations due to copying elements. Balanced tree approaches for set management require O(log n) per operation, slower than DSU's near-constant time.

For sparse connectivity problems where queries vastly outnumber unions, DSU dramatically outperforms alternatives. The data structure makes the union operation slightly more expensive (still near-constant) to make find operations extremely fast, matching the typical query pattern.

Cache Performance

The array-based representation exhibits excellent cache locality. Parent pointers reside in contiguous memory, and path compression reduces pointer chasing. Modern CPU caches perform well with DSU operations compared to pointer-heavy data structures like balanced trees.

Scalability Limits

For extremely large datasets exceeding memory capacity, DSU requires modifications. The parent array must fit in RAM for efficient access. Distributed implementations become necessary when element counts exceed single-machine memory, though distributed DSU algorithms lose the near-constant time guarantee due to network latency.

Practical Examples

Network Connectivity

A network monitoring system tracks connections between servers, determining whether two servers can communicate through intermediate connections.

class NetworkConnectivity
  def initialize(server_count)
    @dsu = DisjointSetUnion.new(server_count)
    @servers = Array.new(server_count) { |i| "server-#{i}" }
  end
  
  def add_connection(server1, server2)
    @dsu.union(server1, server2)
  end
  
  def can_communicate?(server1, server2)
    @dsu.connected?(server1, server2)
  end
  
  def network_clusters
    @dsu.each_set.map do |cluster|
      cluster.map { |id| @servers[id] }
    end
  end
  
  def largest_cluster_size
    @dsu.representatives.map { |root| @dsu.set_size(root) }.max
  end
end

# Usage
network = NetworkConnectivity.new(10)

# Add network connections
network.add_connection(0, 1)
network.add_connection(1, 2)
network.add_connection(3, 4)
network.add_connection(5, 6)
network.add_connection(6, 7)
network.add_connection(2, 5)

puts network.can_communicate?(0, 7)  # => true (connected through 0-1-2-5-6-7)
puts network.can_communicate?(0, 3)  # => false (in different clusters)
puts network.network_clusters.size  # => 3 clusters

This example demonstrates how DSU handles dynamic network topology changes. As connections form, the data structure merges clusters and efficiently answers reachability queries.

Kruskal's Minimum Spanning Tree

Kruskal's algorithm finds the minimum spanning tree of a weighted graph by processing edges in increasing weight order and adding edges that connect different components.

class KruskalMST
  Edge = Struct.new(:u, :v, :weight)
  
  def initialize(vertices)
    @vertices = vertices
    @edges = []
  end
  
  def add_edge(u, v, weight)
    @edges << Edge.new(u, v, weight)
  end
  
  def find_mst
    # Sort edges by weight
    sorted_edges = @edges.sort_by(&:weight)
    
    dsu = DisjointSetUnion.new(@vertices)
    mst_edges = []
    total_weight = 0
    
    sorted_edges.each do |edge|
      # Add edge if it connects different components
      if dsu.union(edge.u, edge.v)
        mst_edges << edge
        total_weight += edge.weight
        
        # MST complete when we have V-1 edges
        break if mst_edges.size == @vertices - 1
      end
    end
    
    { edges: mst_edges, weight: total_weight }
  end
end

# Build graph
graph = KruskalMST.new(6)
graph.add_edge(0, 1, 4)
graph.add_edge(0, 2, 4)
graph.add_edge(1, 2, 2)
graph.add_edge(1, 3, 5)
graph.add_edge(2, 3, 8)
graph.add_edge(2, 4, 10)
graph.add_edge(3, 4, 2)
graph.add_edge(3, 5, 6)
graph.add_edge(4, 5, 3)

result = graph.find_mst
# => MST edges: (1,2,2), (3,4,2), (4,5,3), (0,1,4), (3,5,6)
# => Total weight: 17

DSU detects cycles efficiently in this algorithm. Each union operation succeeds only when adding an edge between different components, guaranteeing the result forms a tree without cycles.

Image Segmentation

Image processing algorithms use DSU to identify connected regions of similar pixels, grouping them into segments.

class ImageSegmentation
  def initialize(height, width)
    @height = height
    @width = width
    @dsu = DisjointSetUnion.new(height * width)
  end
  
  def process_image(pixels, threshold)
    # Connect adjacent pixels with similar values
    @height.times do |row|
      @width.times do |col|
        current_idx = row * @width + col
        current_val = pixels[row][col]
        
        # Check right neighbor
        if col + 1 < @width
          neighbor_idx = row * @width + (col + 1)
          neighbor_val = pixels[row][col + 1]
          
          if (current_val - neighbor_val).abs <= threshold
            @dsu.union(current_idx, neighbor_idx)
          end
        end
        
        # Check bottom neighbor
        if row + 1 < @height
          neighbor_idx = (row + 1) * @width + col
          neighbor_val = pixels[row + 1][col]
          
          if (current_val - neighbor_val).abs <= threshold
            @dsu.union(current_idx, neighbor_idx)
          end
        end
      end
    end
  end
  
  def segment_count
    @dsu.size_count
  end
  
  def get_segments
    segments = Hash.new { |h, k| h[k] = [] }
    
    @height.times do |row|
      @width.times do |col|
        idx = row * @width + col
        root = @dsu.find(idx)
        segments[root] << [row, col]
      end
    end
    
    segments.values
  end
end

# Process 8x8 grayscale image
pixels = [
  [10, 12, 11, 10, 50, 52, 51, 50],
  [11, 10, 12, 11, 51, 50, 52, 51],
  [12, 11, 10, 12, 52, 51, 50, 52],
  [10, 12, 11, 10, 50, 52, 51, 50],
  [90, 91, 92, 90, 10, 11, 12, 10],
  [91, 90, 91, 92, 11, 10, 11, 12],
  [92, 91, 90, 91, 12, 11, 10, 11],
  [90, 92, 91, 90, 10, 12, 11, 10]
]

segmentation = ImageSegmentation.new(8, 8)
segmentation.process_image(pixels, 5)

puts "Found #{segmentation.segment_count} segments"
# => Found 4 segments (top-left, top-right, bottom-left, bottom-right regions)

This example shows how DSU merges adjacent similar pixels into regions. The algorithm makes a single pass through the image, connecting pixels that meet the similarity threshold.

Equivalence Class Detection

Systems that process equivalence relationships benefit from DSU when identifying groups of equivalent items based on transitive relationships.

class EquivalenceClassifier
  def initialize(items)
    @items = items
    @item_to_id = items.each_with_index.to_h
    @dsu = DisjointSetUnion.new(items.size)
  end
  
  def mark_equivalent(item1, item2)
    id1 = @item_to_id[item1]
    id2 = @item_to_id[item2]
    @dsu.union(id1, id2) if id1 && id2
  end
  
  def equivalent?(item1, item2)
    id1 = @item_to_id[item1]
    id2 = @item_to_id[item2]
    id1 && id2 && @dsu.connected?(id1, id2)
  end
  
  def equivalence_classes
    classes = Hash.new { |h, k| h[k] = [] }
    
    @item_to_id.each do |item, id|
      root = @dsu.find(id)
      classes[root] << item
    end
    
    classes.values
  end
end

# Classify email addresses as belonging to same user
emails = [
  "alice@work.com", "alice@personal.com", "alice123@work.com",
  "bob@work.com", "bob_jones@personal.com",
  "carol@work.com"
]

classifier = EquivalenceClassifier.new(emails)

# Establish equivalences based on known relationships
classifier.mark_equivalent("alice@work.com", "alice@personal.com")
classifier.mark_equivalent("alice@personal.com", "alice123@work.com")
classifier.mark_equivalent("bob@work.com", "bob_jones@personal.com")

# Check equivalence
puts classifier.equivalent?("alice@work.com", "alice123@work.com")  # => true
puts classifier.equivalent?("alice@work.com", "bob@work.com")       # => false

# Get equivalence classes
classes = classifier.equivalence_classes
# => [
#   ["alice@work.com", "alice@personal.com", "alice123@work.com"],
#   ["bob@work.com", "bob_jones@personal.com"],
#   ["carol@work.com"]
# ]

The transitivity property holds automatically through DSU: if A equals B and B equals C, then finding the representative for A, B, and C returns the same root, establishing A equals C without explicit declaration.

Common Patterns

Lazy Propagation with DSU

Some applications require associating additional data with each set, such as aggregate values or properties. Storing this data at root nodes enables efficient queries, but union operations must merge these values.

class WeightedDSU
  def initialize(n)
    @parent = Array.new(n) { |i| i }
    @rank = Array.new(n, 0)
    @value = Array.new(n, 0)  # Additional data per set
  end
  
  def find(x)
    return x if @parent[x] == x
    @parent[x] = find(@parent[x])
  end
  
  def union(x, y)
    root_x = find(x)
    root_y = find(y)
    return false if root_x == root_y
    
    # Merge values during union
    total_value = @value[root_x] + @value[root_y]
    
    if @rank[root_x] < @rank[root_y]
      @parent[root_x] = root_y
      @value[root_y] = total_value
    elsif @rank[root_x] > @rank[root_y]
      @parent[root_y] = root_x
      @value[root_x] = total_value
    else
      @parent[root_y] = root_x
      @value[root_x] = total_value
      @rank[root_x] += 1
    end
    
    true
  end
  
  def add_value(x, delta)
    root = find(x)
    @value[root] += delta
  end
  
  def get_value(x)
    @value[find(x)]
  end
end

This pattern maintains aggregate values at root nodes, updating them during union operations. Path compression preserves correctness because values remain at the root.

Rollback Support

Certain algorithms require trying unions speculatively and rolling them back if they lead to invalid states. Supporting rollback requires storing union history without path compression, which would prevent reverting changes.

class RollbackDSU
  def initialize(n)
    @parent = Array.new(n) { |i| i }
    @rank = Array.new(n, 0)
    @history = []
  end
  
  def find(x)
    # No path compression to support rollback
    x = @parent[x] while @parent[x] != x
    x
  end
  
  def union(x, y)
    root_x = find(x)
    root_y = find(y)
    return false if root_x == root_y
    
    # Store state before union
    if @rank[root_x] < @rank[root_y]
      @history << [:parent, root_x, @parent[root_x]]
      @parent[root_x] = root_y
    elsif @rank[root_x] > @rank[root_y]
      @history << [:parent, root_y, @parent[root_y]]
      @parent[root_y] = root_x
    else
      @history << [:parent, root_y, @parent[root_y]]
      @history << [:rank, root_x, @rank[root_x]]
      @parent[root_y] = root_x
      @rank[root_x] += 1
    end
    
    true
  end
  
  def rollback
    return false if @history.empty?
    
    type, index, old_value = @history.pop
    
    case type
    when :parent
      @parent[index] = old_value
    when :rank
      @rank[index] = old_value
    end
    
    true
  end
  
  def checkpoint
    @history.size
  end
  
  def rollback_to(checkpoint)
    while @history.size > checkpoint
      rollback
    end
  end
end

This implementation sacrifices the amortized O(α(n)) complexity of path compression in exchange for rollback capability. Find operations run in O(log n) time, but applications requiring backtracking need this functionality.

Persistent DSU

Functional programming contexts or algorithms requiring access to historical DSU states benefit from persistent implementations that preserve previous versions.

class PersistentDSU
  Version = Struct.new(:parent, :rank, :previous)
  
  def initialize(n)
    initial_parent = Array.new(n) { |i| i }
    initial_rank = Array.new(n, 0)
    @current = Version.new(initial_parent, initial_rank, nil)
  end
  
  def find(x, version = @current)
    parent = version.parent
    x = parent[x] while parent[x] != x
    x
  end
  
  def union(x, y)
    root_x = find(x)
    root_y = find(y)
    return @current if root_x == root_y
    
    # Create new version
    new_parent = @current.parent.dup
    new_rank = @current.rank.dup
    
    if new_rank[root_x] < new_rank[root_y]
      new_parent[root_x] = root_y
    elsif new_rank[root_x] > new_rank[root_y]
      new_parent[root_y] = root_x
    else
      new_parent[root_y] = root_x
      new_rank[root_x] += 1
    end
    
    @current = Version.new(new_parent, new_rank, @current)
  end
  
  def connected?(x, y, version = @current)
    find(x, version) == find(y, version)
  end
  
  def previous_version
    @current = @current.previous if @current.previous
    @current
  end
end

Persistent implementations create new versions on each union, allowing queries on historical states. The memory cost grows with each operation, making this approach suitable only when version history requirements justify the overhead.

Offline Query Processing

When all operations arrive before query processing begins, sorting unions and queries together enables optimizations. This pattern appears in competitive programming and batch processing systems.

def process_connectivity_queries(n, unions, queries)
  # Sort operations by priority (unions before queries at same timestamp)
  operations = []
  
  unions.each do |timestamp, u, v|
    operations << [timestamp, 0, u, v]  # 0 = union
  end
  
  queries.each_with_index do |(timestamp, u, v), idx|
    operations << [timestamp, 1, u, v, idx]  # 1 = query
  end
  
  operations.sort!
  
  dsu = DisjointSetUnion.new(n)
  results = []
  
  operations.each do |op|
    if op[1] == 0  # Union
      dsu.union(op[2], op[3])
    else  # Query
      results[op[4]] = dsu.connected?(op[2], op[3])
    end
  end
  
  results
end

# Process operations in optimal order
unions = [
  [1, 0, 1],
  [3, 2, 3],
  [5, 1, 2]
]

queries = [
  [2, 0, 2],  # false at time 2
  [4, 0, 3],  # true at time 4
  [6, 0, 3]   # true at time 6
]

results = process_connectivity_queries(4, unions, queries)
# => [false, true, true]

This pattern processes operations in temporal order, building connectivity state progressively and answering queries based on the state at their timestamp.

Reference

Core Operations

Operation	Description	Time Complexity
initialize(n)	Create DSU with n elements, each in own set	O(n)
find(x)	Return representative of set containing x	O(α(n)) amortized
union(x, y)	Merge sets containing x and y	O(α(n)) amortized
connected?(x, y)	Check if x and y in same set	O(α(n)) amortized
set_size(x)	Return size of set containing x	O(α(n)) amortized
size_count	Return number of disjoint sets	O(1)

Optimization Techniques

Technique	Impact	Trade-off
Path Compression	Reduces amortized time to O(α(n))	Prevents rollback capability
Union by Rank	Bounds tree height logarithmically	Requires additional O(n) space
Union by Size	Alternative to union by rank	Maintains exact size information
Iterative Find	Avoids stack overflow	Requires two passes through path
Two-Pass Compression	Flattens entire path	Slightly higher constant factor

Implementation Variants

Variant	Use Case	Characteristics
Standard DSU	General connectivity queries	Path compression + union by rank
Weighted DSU	Aggregate values per set	Additional data stored at roots
Rollback DSU	Speculative unions	No path compression, stores history
Persistent DSU	Historical queries	Immutable versions, higher memory
Range DSU	Sequential element merging	Optimized for interval unions

Space Complexity

Component	Space	Notes
Parent array	O(n)	Required for all implementations
Rank array	O(n)	Used with union by rank
Size array	O(n)	Used with union by size
History stack	O(m)	For rollback support, m unions
Version list	O(m × n)	For persistent implementation

Common Applications

Application	DSU Role	Additional Considerations
Kruskal's MST	Cycle detection	Process edges by weight
Connected Components	Track connectivity	Works on dynamic graphs
Image Segmentation	Group similar pixels	Threshold determines merging
Network Analysis	Cluster identification	Handle dynamic connections
Equivalence Classes	Transitive relations	Automatic transitivity
Social Networks	Community detection	Merge based on relationships

Performance Characteristics

Scenario	Operations	Actual Time
Initialization	n elements	O(n)
Sequential unions	n-1 unions	O(n × α(n))
Random unions	m unions	O(m × α(n))
Query-heavy	m finds, k unions	O((m+k) × α(n))
Complete graph	O(n²) edges	O(n² × α(n))

Edge Case Handling

Case	Behavior	Implementation Note
Union same element	No change, return false	Check root_x == root_y
Find on root	Return immediately	Base case parent[x] == x
Empty set	Not applicable	All elements start in sets
Single element	Already a root	Parent points to self
All elements united	One set remaining	size_count becomes 1
Invalid index	Undefined behavior	Validate indices externally

Ruby-Specific Idioms

Pattern	Example	Purpose
Array initialization	Array.new(n) { i }	Create parent array
Predicate method	connected?(x, y)	Boolean query naming
Attribute reader	attr_reader :size_count	Expose internal state
Struct definition	Edge = Struct.new(:u, :v, :w)	Lightweight data containers
Enumerator pattern	each_set without block	Lazy evaluation support
Hash default block	Hash.new { [] }	Group elements by root

Common Pitfalls

Mistake	Impact	Solution
No optimization	O(n) per operation	Implement path compression
Arbitrary union	Tall trees	Use union by rank or size
Recursive without limit	Stack overflow	Use iterative implementation
Forgetting size decrement	Wrong component count	Update in union method
Mutating during iteration	Undefined behavior	Copy before modifying
Assuming sorted output	Incorrect results	Sets have no defined order

Disjoint Set Union (Union-Find)