CrackedRuby - Topological Sorting

Overview

Topological sorting produces a linear ordering of vertices in a directed acyclic graph (DAG) where dependencies determine the sequence. The algorithm ensures that if vertex A has an edge pointing to vertex B, then A appears before B in the final ordering. This property makes topological sorting essential for dependency resolution, task scheduling, build systems, and any scenario where operations must execute in a specific order based on prerequisites.

The algorithm applies exclusively to directed acyclic graphs. The presence of cycles makes topological ordering impossible because circular dependencies prevent any valid linear sequence. A graph with vertices {A, B, C} and edges A→B, B→C, C→A contains a cycle and has no topological ordering.

Consider a software build system where modules have compilation dependencies. If module Database depends on Logger, and Application depends on both Database and Logger, a valid topological order might be [Logger, Database, Application]. This ordering ensures each module compiles only after its dependencies are available.

Two primary algorithms implement topological sorting: depth-first search with post-order traversal, and Kahn's algorithm using in-degree counts. Both produce valid orderings, though the specific sequence may differ when multiple valid orderings exist. The choice between algorithms depends on whether cycle detection should occur before or during the sort, and whether the implementation needs to track in-degrees.

Key Principles

Topological sorting operates on directed acyclic graphs where vertices represent tasks or entities and edges represent dependencies or precedence constraints. The fundamental principle requires that for every directed edge (u, v) in the graph, vertex u appears before vertex v in the resulting order.

A directed acyclic graph contains no cycles. This property remains critical because cycles create circular dependencies that prevent any valid topological ordering. Before performing topological sorting, algorithms either verify the graph is acyclic or detect cycles during the sorting process.

The depth-first search approach traverses the graph recursively, marking vertices as visited and adding them to the result in post-order sequence. Post-order traversal means a vertex gets added to the result only after all vertices it points to have been processed. This creates a reverse topological order, which requires reversal to produce the final ordering.

Kahn's algorithm operates differently by tracking the in-degree of each vertex (the number of incoming edges). The algorithm repeatedly selects vertices with in-degree zero, removes them from the graph, and decrements the in-degree of their neighbors. This process continues until no vertices remain or no vertex with zero in-degree exists (indicating a cycle).

Multiple valid topological orderings can exist for the same graph. Consider a graph with vertices {A, B, C} and edges A→C, B→C. Both [A, B, C] and [B, A, C] represent valid topological orderings because neither A nor B depends on the other. Algorithms do not guarantee any specific ordering among independent vertices.

The algorithm's correctness relies on the DAG property. Each time a vertex gets selected, all its dependencies have already been processed. In DFS-based sorting, post-order traversal ensures dependencies appear before dependents in the reversed result. In Kahn's algorithm, removing vertices with zero in-degree ensures dependencies are satisfied before processing dependents.

Ruby Implementation

Ruby implementations of topological sorting use hash-based adjacency lists to represent the graph structure. The graph maps each vertex to an array of vertices it points to.

# DFS-based topological sort
def topological_sort_dfs(graph)
  visited = {}
  result = []
  
  graph.keys.each do |vertex|
    visit(vertex, graph, visited, result) unless visited[vertex]
  end
  
  result.reverse
end

def visit(vertex, graph, visited, result)
  return if visited[vertex]
  
  visited[vertex] = true
  
  graph[vertex].each do |neighbor|
    visit(neighbor, graph, visited, result)
  end
  
  result << vertex
end

# Example usage
graph = {
  'A' => ['C'],
  'B' => ['C', 'D'],
  'C' => ['E'],
  'D' => ['E'],
  'E' => []
}

topological_sort_dfs(graph)
# => ["A", "B", "D", "C", "E"] or ["B", "A", "D", "C", "E"]

The DFS implementation uses a visited hash to track processed vertices and prevent infinite recursion. The recursive visit method processes a vertex only if not already visited, explores all neighbors recursively, then adds the vertex to the result. Reversing the result produces the topological ordering because post-order traversal naturally creates a reverse topological sequence.

Kahn's algorithm requires calculating in-degrees and maintaining a queue of vertices with zero in-degree:

def topological_sort_kahn(graph)
  in_degree = Hash.new(0)
  
  # Calculate in-degrees
  graph.each do |vertex, neighbors|
    in_degree[vertex] ||= 0
    neighbors.each { |neighbor| in_degree[neighbor] += 1 }
  end
  
  # Find vertices with zero in-degree
  queue = in_degree.select { |v, degree| degree == 0 }.keys
  result = []
  
  until queue.empty?
    vertex = queue.shift
    result << vertex
    
    graph[vertex].each do |neighbor|
      in_degree[neighbor] -= 1
      queue << neighbor if in_degree[neighbor] == 0
    end
  end
  
  result.size == graph.size ? result : nil
end

Kahn's algorithm returns nil when the graph contains a cycle, detected when the result size differs from the number of vertices. This built-in cycle detection makes Kahn's algorithm preferable when validation is required before processing.

Ruby's standard library does not include topological sorting, but the TSort module in the standard library provides a mixin for topological sorting:

require 'tsort'

class TaskGraph
  include TSort
  
  def initialize
    @dependencies = Hash.new { |h, k| h[k] = [] }
  end
  
  def add_dependency(task, depends_on)
    @dependencies[task] << depends_on
    @dependencies[depends_on] ||= []
  end
  
  def tsort_each_node(&block)
    @dependencies.each_key(&block)
  end
  
  def tsort_each_child(node, &block)
    @dependencies[node].each(&block)
  end
end

tasks = TaskGraph.new
tasks.add_dependency('compile', 'generate_code')
tasks.add_dependency('link', 'compile')
tasks.add_dependency('package', 'link')

tasks.tsort
# => ["generate_code", "compile", "link", "package"]

The TSort module requires implementing two methods: tsort_each_node to enumerate all vertices, and tsort_each_child to enumerate a vertex's dependencies. The module provides tsort for ordering and tsort_each for iterating in topological order.

Cycle detection with detailed information requires extending the basic DFS implementation:

def topological_sort_with_cycle_detection(graph)
  WHITE, GRAY, BLACK = 0, 1, 2
  color = Hash.new(WHITE)
  result = []
  
  detect_cycle = lambda do |vertex, path|
    return path if color[vertex] == GRAY
    return nil if color[vertex] == BLACK
    
    color[vertex] = GRAY
    path << vertex
    
    graph[vertex].each do |neighbor|
      cycle = detect_cycle.call(neighbor, path.dup)
      return cycle if cycle
    end
    
    color[vertex] = BLACK
    result << vertex
    nil
  end
  
  graph.keys.each do |vertex|
    cycle = detect_cycle.call(vertex, [])
    return { error: 'Cycle detected', cycle: cycle } if cycle
  end
  
  { order: result.reverse }
end

This implementation uses three colors to track vertex states: white (unvisited), gray (currently being explored), and black (fully processed). Encountering a gray vertex during traversal indicates a back edge and thus a cycle.

Practical Examples

Build systems require topological sorting to compile source files in dependency order. Consider a Ruby application with multiple files where some files require others:

class BuildSystem
  def initialize
    @files = {}
  end
  
  def add_file(name, dependencies = [])
    @files[name] = dependencies
  end
  
  def build_order
    graph = {}
    
    @files.each do |file, deps|
      graph[file] = deps
      deps.each { |dep| graph[dep] ||= [] }
    end
    
    topological_sort_kahn(graph)
  end
  
  def compile_all
    order = build_order
    return { error: 'Circular dependency detected' } unless order
    
    order.each do |file|
      compile_file(file)
    end
  end
  
  private
  
  def compile_file(file)
    puts "Compiling #{file}"
    # Compilation logic here
  end
end

build = BuildSystem.new
build.add_file('logger.rb', [])
build.add_file('database.rb', ['logger.rb'])
build.add_file('user_model.rb', ['database.rb'])
build.add_file('application.rb', ['user_model.rb', 'logger.rb'])

build.build_order
# => ["logger.rb", "database.rb", "user_model.rb", "application.rb"]

Task scheduling systems use topological sorting to execute jobs respecting dependencies. A deployment pipeline might have stages where each stage depends on previous stages:

class DeploymentPipeline
  def initialize
    @stages = {}
  end
  
  def add_stage(name, dependencies = [])
    @stages[name] = { deps: dependencies, status: :pending }
  end
  
  def execute
    order = compute_execution_order
    return { error: 'Invalid pipeline configuration' } unless order
    
    order.each do |stage|
      execute_stage(stage)
    end
  end
  
  private
  
  def compute_execution_order
    graph = @stages.transform_values { |v| v[:deps] }
    topological_sort_dfs(graph)
  end
  
  def execute_stage(stage)
    puts "Executing: #{stage}"
    @stages[stage][:status] = :running
    
    # Stage execution logic
    sleep(1)
    
    @stages[stage][:status] = :completed
    puts "Completed: #{stage}"
  end
end

pipeline = DeploymentPipeline.new
pipeline.add_stage('build')
pipeline.add_stage('test', ['build'])
pipeline.add_stage('security_scan', ['build'])
pipeline.add_stage('deploy_staging', ['test', 'security_scan'])
pipeline.add_stage('integration_test', ['deploy_staging'])
pipeline.add_stage('deploy_production', ['integration_test'])

pipeline.execute

Course prerequisite systems determine valid course sequences for students. Universities use topological sorting to find valid paths through degree requirements:

class CourseScheduler
  def initialize
    @courses = {}
  end
  
  def add_course(code, prerequisites = [])
    @courses[code] = prerequisites
  end
  
  def valid_sequences
    graph = build_graph
    base_order = topological_sort_kahn(graph)
    
    return [] unless base_order
    
    # Find all valid orderings respecting prerequisites
    generate_valid_sequences(base_order, graph)
  end
  
  def earliest_semester(course)
    graph = build_graph
    
    # Calculate longest path to course
    distances = {}
    order = topological_sort_kahn(graph)
    
    return nil unless order
    
    order.each do |c|
      max_prereq_distance = graph[c].map { |prereq| distances[prereq] || 0 }.max || 0
      distances[c] = max_prereq_distance + 1
    end
    
    distances[course]
  end
  
  private
  
  def build_graph
    graph = {}
    @courses.each do |course, prereqs|
      graph[course] = prereqs
      prereqs.each { |p| graph[p] ||= [] }
    end
    graph
  end
  
  def generate_valid_sequences(base_order, graph)
    # Find independent courses that can be taken in any order
    independent_groups = []
    
    base_order.each do |course|
      prereqs_satisfied = graph[course].empty?
      if prereqs_satisfied
        # Group courses with no dependencies
        independent_groups << course
      end
    end
    
    [base_order]  # Simplified - full implementation would generate permutations
  end
end

scheduler = CourseScheduler.new
scheduler.add_course('CS101')
scheduler.add_course('CS102', ['CS101'])
scheduler.add_course('CS201', ['CS102'])
scheduler.add_course('CS202', ['CS102'])
scheduler.add_course('CS301', ['CS201', 'CS202'])

scheduler.earliest_semester('CS301')
# => 4 (CS101 in semester 1, CS102 in semester 2, CS201/CS202 in semester 3, CS301 in semester 4)

Package managers resolve installation order for software packages with dependencies:

class PackageManager
  def initialize
    @packages = {}
  end
  
  def register_package(name, version, dependencies = {})
    @packages[name] = {
      version: version,
      dependencies: dependencies
    }
  end
  
  def install_order(package_names)
    graph = build_dependency_graph(package_names)
    return { error: 'Dependency cycle detected' } unless graph[:success]
    
    order = topological_sort_kahn(graph[:graph])
    return { error: 'Cannot resolve dependencies' } unless order
    
    order.map { |pkg| "#{pkg} (#{@packages[pkg][:version]})" }
  end
  
  private
  
  def build_dependency_graph(package_names)
    graph = {}
    visited = {}
    queue = package_names.dup
    
    until queue.empty?
      pkg = queue.shift
      next if visited[pkg]
      
      return { success: false } unless @packages[pkg]
      
      visited[pkg] = true
      deps = @packages[pkg][:dependencies].keys
      graph[pkg] = deps
      
      queue.concat(deps)
    end
    
    { success: true, graph: graph }
  end
end

pm = PackageManager.new
pm.register_package('ruby', '3.2.0')
pm.register_package('rack', '3.0.0', { 'ruby' => '>= 2.7' })
pm.register_package('rails', '7.1.0', { 'rack' => '>= 2.0', 'ruby' => '>= 3.0' })
pm.register_package('devise', '4.9.0', { 'rails' => '>= 6.0' })

pm.install_order(['devise'])
# => ["ruby (3.2.0)", "rack (3.0.0)", "rails (7.1.0)", "devise (4.9.0)"]

Performance Considerations

Topological sorting operates in O(V + E) time complexity where V represents the number of vertices and E represents the number of edges. Both DFS-based and Kahn's algorithm achieve this complexity by visiting each vertex once and examining each edge once.

The DFS implementation traverses each vertex exactly once, marking it as visited to prevent reprocessing. For each vertex, the algorithm examines all outgoing edges once. This results in O(V) time for vertex processing and O(E) time for edge examination, totaling O(V + E).

Space complexity depends on the graph representation and algorithm state. Adjacency list representation requires O(V + E) space to store vertices and edges. The DFS approach needs O(V) space for the visited set and O(V) space for the recursion stack in the worst case. Kahn's algorithm requires O(V) space for the in-degree hash and O(V) space for the queue.

For sparse graphs where E is much smaller than V², the linear time complexity provides excellent performance. Dense graphs where E approaches V² still complete in linear time relative to the input size, making the algorithm efficient across different graph densities.

Memory access patterns differ between implementations. DFS-based sorting follows pointer chains through the graph structure, which may cause cache misses when the graph structure spreads across memory. Kahn's algorithm processes vertices in a queue, providing more sequential memory access when vertices are stored contiguously.

require 'benchmark'

def benchmark_sorting(vertex_count, edge_ratio)
  # Generate random DAG
  graph = generate_dag(vertex_count, edge_ratio)
  
  Benchmark.bmbm do |x|
    x.report("DFS:") do
      1000.times { topological_sort_dfs(graph) }
    end
    
    x.report("Kahn:") do
      1000.times { topological_sort_kahn(graph) }
    end
  end
end

def generate_dag(vertices, edge_ratio)
  graph = Hash.new { |h, k| h[k] = [] }
  
  vertices.times do |i|
    graph[i] = []
    
    # Add edges only to higher-numbered vertices to ensure acyclic
    ((i + 1)...vertices).each do |j|
      graph[i] << j if rand < edge_ratio
    end
  end
  
  graph
end

The choice between DFS and Kahn's algorithm affects performance in specific scenarios. DFS requires less initialization overhead since it does not need to compute in-degrees. Kahn's algorithm performs better when early cycle detection is critical because it identifies cycles without exploring the entire graph.

For very large graphs that exceed available memory, external memory algorithms become necessary. These algorithms partition the graph across disk and memory, processing sections independently. Standard topological sorting assumes the entire graph fits in memory.

Parallel topological sorting can improve performance on multi-core systems when the graph contains many independent subgraphs. Level-based approaches identify vertices at the same topological level and process them concurrently. However, synchronization overhead may negate benefits for smaller graphs.

Common Patterns

The incremental topological sort pattern handles dynamic graphs where edges and vertices are added over time. This pattern maintains the topological order as changes occur rather than recomputing from scratch:

class IncrementalTopologicalSort
  def initialize
    @order = []
    @position = {}
    @graph = Hash.new { |h, k| h[k] = [] }
  end
  
  def add_vertex(vertex)
    return if @position.key?(vertex)
    
    @order << vertex
    @position[vertex] = @order.size - 1
    @graph[vertex] = []
  end
  
  def add_edge(from, to)
    @graph[from] << to
    
    # Check if edge violates current ordering
    if @position[from] > @position[to]
      reorder_after_edge(from, to)
    end
  end
  
  def current_order
    @order.dup
  end
  
  private
  
  def reorder_after_edge(from, to)
    # Move 'from' vertex before 'to' if needed
    affected = vertices_between(to, from)
    
    # Recompute positions for affected vertices
    # Simplified implementation
    @order = topological_sort_kahn(@graph)
    @order.each_with_index { |v, i| @position[v] = i }
  end
  
  def vertices_between(start_vertex, end_vertex)
    start_pos = @position[start_vertex]
    end_pos = @position[end_vertex]
    
    @order[start_pos..end_pos]
  end
end

The parallel execution pattern identifies independent tasks that can run concurrently. Tasks at the same topological level have no dependencies between them:

def parallel_execution_levels(graph)
  in_degree = Hash.new(0)
  
  graph.each do |vertex, neighbors|
    in_degree[vertex] ||= 0
    neighbors.each { |neighbor| in_degree[neighbor] += 1 }
  end
  
  levels = []
  current_level = in_degree.select { |v, degree| degree == 0 }.keys
  
  until current_level.empty?
    levels << current_level
    
    next_level = []
    current_level.each do |vertex|
      graph[vertex].each do |neighbor|
        in_degree[neighbor] -= 1
        next_level << neighbor if in_degree[neighbor] == 0
      end
    end
    
    current_level = next_level
  end
  
  levels
end

# Example: Execute tasks in parallel by level
levels = parallel_execution_levels(graph)
levels.each do |level|
  threads = level.map do |task|
    Thread.new { execute_task(task) }
  end
  threads.each(&:join)
end

The priority-based sorting pattern extends basic topological sorting to handle priorities when multiple valid orderings exist:

def topological_sort_with_priority(graph, priorities)
  in_degree = Hash.new(0)
  
  graph.each do |vertex, neighbors|
    in_degree[vertex] ||= 0
    neighbors.each { |neighbor| in_degree[neighbor] += 1 }
  end
  
  # Use priority queue instead of regular queue
  require 'set'
  available = SortedSet.new
  
  in_degree.select { |v, degree| degree == 0 }.keys.each do |v|
    available.add([priorities[v] || 0, v])
  end
  
  result = []
  
  until available.empty?
    priority, vertex = available.first
    available.delete([priority, vertex])
    result << vertex
    
    graph[vertex].each do |neighbor|
      in_degree[neighbor] -= 1
      if in_degree[neighbor] == 0
        available.add([priorities[neighbor] || 0, neighbor])
      end
    end
  end
  
  result
end

Common Pitfalls

Failing to detect cycles before sorting leads to incorrect results or infinite loops. The basic DFS implementation without cycle detection can recurse indefinitely on cyclic graphs:

# Problematic: No cycle detection
def unsafe_topological_sort(graph)
  visited = {}
  result = []
  
  graph.keys.each do |vertex|
    visit_unsafe(vertex, graph, visited, result)
  end
  
  result.reverse
end

def visit_unsafe(vertex, graph, visited, result)
  return if visited[vertex]
  visited[vertex] = true
  
  graph[vertex].each do |neighbor|
    visit_unsafe(neighbor, graph, visited, result)  # May recurse forever
  end
  
  result << vertex
end

# Safe version with cycle detection
def safe_topological_sort(graph)
  WHITE, GRAY, BLACK = 0, 1, 2
  color = Hash.new(WHITE)
  result = []
  
  visit_safe = lambda do |vertex|
    return true if color[vertex] == BLACK
    return false if color[vertex] == GRAY  # Cycle detected
    
    color[vertex] = GRAY
    
    graph[vertex].each do |neighbor|
      return false unless visit_safe.call(neighbor)
    end
    
    color[vertex] = BLACK
    result << vertex
    true
  end
  
  graph.keys.each do |vertex|
    return nil unless visit_safe.call(vertex)
  end
  
  result.reverse
end

Assuming a unique topological ordering causes errors when multiple valid orderings exist. Different algorithm implementations may produce different valid results:

graph = {
  'A' => ['C'],
  'B' => ['C'],
  'C' => []
}

# Both orderings are valid
topological_sort_dfs(graph)   # => ["A", "B", "C"] or ["B", "A", "C"]
topological_sort_kahn(graph)  # => ["A", "B", "C"] or ["B", "A", "C"]

# Don't write code assuming specific ordering

Modifying the graph during iteration breaks topological sorting. Algorithms assume a static graph structure throughout execution:

# Problematic: Graph modification during sort
def broken_sort(graph)
  result = []
  
  while graph.any?
    # Find vertex with no dependencies
    vertex = graph.find { |v, deps| deps.empty? }&.first
    return nil unless vertex
    
    result << vertex
    graph.delete(vertex)
    
    # This modification during iteration causes issues
    graph.each do |v, deps|
      deps.delete(vertex)
    end
  end
  
  result
end

Forgetting to handle disconnected components produces incomplete results. A graph may contain multiple independent subgraphs:

def topological_sort_all_components(graph)
  visited = {}
  result = []
  
  # Must iterate through ALL vertices, not just those reachable from one vertex
  graph.keys.each do |vertex|
    visit(vertex, graph, visited, result) unless visited[vertex]
  end
  
  result.reverse
end

Mixing up edge direction causes reverse orderings. The edge A→B means A must come before B, not after:

# Correct: A depends on B means B must come first
dependencies = {
  'compile' => ['generate_code'],  # compile depends on generate_code
  'link' => ['compile'],
  'package' => ['link']
}

# Result should be: generate_code, compile, link, package

Reference

Algorithm Comparison

Algorithm	Time Complexity	Space Complexity	Cycle Detection	Best Use Case
DFS-based	O(V + E)	O(V)	During traversal	Simple implementation needed
Kahn's algorithm	O(V + E)	O(V)	Built-in	Early cycle detection required
Parallel level-based	O(V + E)	O(V)	Built-in	Concurrent execution needed

Key Methods

Operation	DFS Approach	Kahn's Approach
Initialize	Create visited set	Calculate in-degrees
Process vertex	Recursive DFS visit	Remove from queue
Handle neighbor	Recurse if unvisited	Decrement in-degree
Detect cycle	Track recursion stack	Check result size
Return result	Reverse post-order	Direct order

Graph Properties

Property	Requirement	Consequence if Violated
Directed edges	Must be directed	Undirected edges lack ordering semantics
Acyclic	No cycles allowed	No valid topological ordering exists
Connected	Not required	Multiple valid orderings per component
Weighted	Not used	Weights ignored in standard topological sort

Common Applications

Domain	Use Case	Key Consideration
Build systems	Compilation order	Minimize rebuild on changes
Package management	Installation sequence	Handle version conflicts
Task scheduling	Execution order	Maximize parallelism
Course planning	Prerequisite chains	Optimize semester load
Database migrations	Migration order	Ensure referential integrity
Event processing	Event dependency order	Maintain causality

Implementation Checklist

Step	DFS Implementation	Kahn's Implementation
1	Initialize visited tracking	Calculate all in-degrees
2	Create result container	Initialize zero in-degree queue
3	Iterate all vertices	Process queue until empty
4	Perform DFS for unvisited	Decrement neighbor in-degrees
5	Append in post-order	Append when processing
6	Reverse final result	Return result directly
7	Check for cycles via colors	Check result size equals vertex count

Ruby TSort Module

Method	Purpose	Returns
tsort	Compute topological order	Array of nodes in order
tsort_each	Iterate in topological order	Enumerator
strongly_connected_components	Find SCCs	Array of component arrays
each_strongly_connected_component	Iterate SCCs	Enumerator
each_strongly_connected_component_from	SCC from starting node	Enumerator

Edge Cases

Scenario	Expected Behavior	Handling Strategy
Empty graph	Return empty array	Check graph size before processing
Single vertex	Return single-element array	Base case in recursion
Disconnected components	Process all components	Iterate all vertices as starting points
Self-loop	Detect as cycle	Check vertex pointing to itself
Multiple edges between vertices	Use any edge once	Deduplicate adjacency list
Cycle present	Return nil or error	Track gray vertices or compare result size

Topological Sorting