CrackedRuby CrackedRuby

Graph Traversal (BFS, DFS)

Overview

Graph traversal algorithms explore nodes and edges in a graph data structure. Breadth-First Search (BFS) and Depth-First Search (DFS) represent the two primary traversal strategies, each visiting all reachable nodes from a starting point but differing fundamentally in their exploration order.

BFS explores nodes level by level, visiting all neighbors of the current node before moving to neighbors of those neighbors. This approach uses a queue data structure to maintain the order of exploration, ensuring that nodes closer to the starting point are visited before more distant nodes.

DFS explores as far as possible along each branch before backtracking. This approach uses either recursion or an explicit stack, diving deep into the graph structure before exploring breadth. The algorithm visits a node, then immediately proceeds to an unvisited neighbor, continuing this pattern until reaching a dead end.

Both algorithms guarantee visiting every reachable node exactly once when properly implemented with visited tracking. The choice between them depends on the specific problem requirements, graph characteristics, and resource constraints.

Graph traversal forms the foundation for numerous algorithms including pathfinding, cycle detection, topological sorting, connected component identification, and network analysis. Web crawlers use BFS to discover pages, social networks employ graph traversal to compute relationships, and compilers use DFS for syntax tree analysis.

# Simple graph representation
graph = {
  'A' => ['B', 'C'],
  'B' => ['D', 'E'],
  'C' => ['F'],
  'D' => [],
  'E' => ['F'],
  'F' => []
}

# BFS visits: A, B, C, D, E, F
# DFS visits: A, B, D, E, F, C (one possible order)

Key Principles

Graph traversal operates on graph structures consisting of vertices (nodes) and edges (connections between nodes). The traversal process marks nodes as visited to prevent infinite loops in graphs with cycles and ensures each node is processed exactly once.

BFS employs a queue following First-In-First-Out (FIFO) ordering. The algorithm initializes by enqueuing the starting node and marking it as visited. During each iteration, it dequeues a node, processes it, then enqueues all unvisited neighbors. This creates a wave-like expansion pattern radiating outward from the starting point. The queue ensures that nodes at distance d from the start are all visited before any nodes at distance d+1.

# BFS exploration pattern for tree
#       1
#      / \
#     2   3
#    / \   \
#   4   5   6
#
# Visit order: 1, 2, 3, 4, 5, 6
# Level 0: 1
# Level 1: 2, 3
# Level 2: 4, 5, 6

DFS employs a stack following Last-In-First-Out (LIFO) ordering, either explicitly or through recursive call stack. The algorithm visits a node, marks it as visited, then immediately proceeds to an unvisited neighbor. When no unvisited neighbors remain, it backtracks to the previous node and continues exploration. This creates a path-like exploration pattern diving deep before exploring breadth.

# DFS exploration pattern for same tree
# Visit order (one possibility): 1, 2, 4, 5, 3, 6
# Path: 1 -> 2 -> 4 (backtrack) -> 5 (backtrack) -> 3 -> 6

The visited set tracks which nodes have been explored, preventing duplicate processing and infinite loops in cyclic graphs. Without visited tracking, cycles cause infinite traversal as the algorithm revisits the same nodes repeatedly.

# Graph with cycle
cycle_graph = {
  'A' => ['B'],
  'B' => ['C'],
  'C' => ['A']  # cycle back to A
}

# Without visited tracking: A -> B -> C -> A -> B -> C -> ...
# With visited tracking: A -> B -> C (stops)

The traversal frontier represents nodes discovered but not yet processed. BFS maintains this frontier as a queue, while DFS maintains it as a stack. The frontier structure determines exploration order and distinguishes the two algorithms.

Parent tracking during traversal enables path reconstruction from the starting node to any visited node. Each node records which node discovered it, creating a tree structure overlaid on the original graph. This proves essential for shortest path finding and navigation applications.

Graph representation affects traversal implementation. Adjacency lists store each node's neighbors in a collection, while adjacency matrices use a 2D array indicating edge presence. Adjacency lists generally provide better performance for sparse graphs, while matrices suit dense graphs or frequent edge existence queries.

Design Considerations

Selecting between BFS and DFS depends on problem requirements, graph characteristics, and resource constraints. Neither algorithm is universally superior; each excels in specific scenarios.

BFS guarantees finding the shortest path in unweighted graphs because it explores nodes in order of increasing distance from the start. When the first path to the target is discovered, no shorter path exists. This property makes BFS the correct choice for minimum hop count problems, such as finding degrees of separation in social networks or minimum moves in game state spaces.

# Finding shortest path between two users
def shortest_connection(graph, start_user, target_user)
  queue = [[start_user]]
  visited = Set.new([start_user])
  
  until queue.empty?
    path = queue.shift
    node = path.last
    
    return path if node == target_user
    
    graph[node].each do |neighbor|
      unless visited.include?(neighbor)
        visited.add(neighbor)
        queue.push(path + [neighbor])
      end
    end
  end
  
  nil
end

DFS requires less memory for deep graphs, storing only the current path from root to the current node plus unexplored siblings. Memory usage grows with graph depth rather than breadth. For graphs with high branching factors but limited depth, DFS uses significantly less memory than BFS.

DFS naturally suits problems requiring path enumeration or backtracking. Topological sorting, finding strongly connected components, and solving puzzles with constraint satisfaction all benefit from DFS's path-oriented exploration. The recursive structure of DFS aligns with problems that build solutions incrementally and backtrack when constraints are violated.

# Finding all paths between two nodes
def find_all_paths(graph, start, target, path = [], visited = Set.new)
  path = path + [start]
  visited = visited.dup.add(start)
  
  return [path] if start == target
  
  paths = []
  graph[start].each do |neighbor|
    unless visited.include?(neighbor)
      paths.concat(find_all_paths(graph, neighbor, target, path, visited))
    end
  end
  
  paths
end

BFS performs better for shallow targets in graphs with high branching factors. If the target resides near the starting point, BFS discovers it quickly without exploring unnecessary deep branches. Search engines crawling web pages use BFS to find content close to seed URLs before exploring distant pages.

DFS performs better for deep targets in graphs with low branching factors. In decision trees or game state exploration where solutions lie deep in the search space, DFS reaches them without expanding the massive frontiers that BFS would create.

Cycle detection requires DFS to track nodes in the current recursion path, distinguishing between back edges (creating cycles) and cross edges (connecting different branches). BFS cannot efficiently detect cycles because it lacks the path context that recursive DFS naturally maintains.

Memory-constrained environments favor iterative DFS with explicit stack size limits, while thread-safe implementations favor BFS because queue operations are easier to synchronize than recursive stack manipulation.

Ruby Implementation

Ruby provides built-in data structures that facilitate both BFS and DFS implementation. Arrays serve as both queues and stacks through shift/push operations, while Sets efficiently track visited nodes.

require 'set'

class Graph
  def initialize
    @adjacency_list = Hash.new { |h, k| h[k] = [] }
  end
  
  def add_edge(from, to)
    @adjacency_list[from] << to
  end
  
  def neighbors(node)
    @adjacency_list[node]
  end
  
  def bfs(start, &block)
    visited = Set.new
    queue = [start]
    visited.add(start)
    
    until queue.empty?
      node = queue.shift
      block.call(node) if block
      
      neighbors(node).each do |neighbor|
        unless visited.include?(neighbor)
          visited.add(neighbor)
          queue.push(neighbor)
        end
      end
    end
    
    visited
  end
  
  def dfs_iterative(start, &block)
    visited = Set.new
    stack = [start]
    
    until stack.empty?
      node = stack.pop
      next if visited.include?(node)
      
      visited.add(node)
      block.call(node) if block
      
      neighbors(node).reverse_each do |neighbor|
        stack.push(neighbor) unless visited.include?(neighbor)
      end
    end
    
    visited
  end
  
  def dfs_recursive(start, visited = Set.new, &block)
    return if visited.include?(start)
    
    visited.add(start)
    block.call(start) if block
    
    neighbors(start).each do |neighbor|
      dfs_recursive(neighbor, visited, &block)
    end
    
    visited
  end
end

The BFS implementation uses shift to dequeue from the front and push to enqueue at the back, maintaining FIFO ordering. Nodes are marked as visited when enqueued rather than when dequeued, preventing duplicate entries in the queue.

The iterative DFS uses pop to retrieve the most recently added node, creating LIFO ordering. The reverse iteration over neighbors maintains the same left-to-right exploration order as recursive DFS, though exploration order varies based on neighbor processing sequence.

# Building and traversing a graph
graph = Graph.new
graph.add_edge('A', 'B')
graph.add_edge('A', 'C')
graph.add_edge('B', 'D')
graph.add_edge('B', 'E')
graph.add_edge('C', 'F')
graph.add_edge('E', 'F')

puts "BFS traversal:"
graph.bfs('A') { |node| puts node }
# Output: A B C D E F

puts "\nDFS traversal (iterative):"
graph.dfs_iterative('A') { |node| puts node }
# Output: A C F B E D (one possible order)

puts "\nDFS traversal (recursive):"
graph.dfs_recursive('A') { |node| puts node }
# Output: A B D E F C (one possible order)

Implementing shortest path finding requires tracking parent relationships during BFS:

class Graph
  def shortest_path(start, target)
    return [start] if start == target
    
    visited = Set.new([start])
    queue = [start]
    parent = { start => nil }
    
    until queue.empty?
      node = queue.shift
      
      return reconstruct_path(parent, start, target) if node == target
      
      neighbors(node).each do |neighbor|
        unless visited.include?(neighbor)
          visited.add(neighbor)
          parent[neighbor] = node
          queue.push(neighbor)
        end
      end
    end
    
    nil
  end
  
  private
  
  def reconstruct_path(parent, start, target)
    path = []
    current = target
    
    while current
      path.unshift(current)
      current = parent[current]
    end
    
    path
  end
end

DFS lends itself to detecting cycles through color marking: white (unvisited), gray (currently being processed), and black (fully processed). A gray node encountering another gray node indicates a back edge and thus a cycle:

class Graph
  def has_cycle?
    color = Hash.new(:white)
    
    @adjacency_list.keys.each do |node|
      return true if dfs_cycle_check(node, color)
    end
    
    false
  end
  
  private
  
  def dfs_cycle_check(node, color)
    return false if color[node] == :black
    return true if color[node] == :gray
    
    color[node] = :gray
    
    neighbors(node).each do |neighbor|
      return true if dfs_cycle_check(neighbor, color)
    end
    
    color[node] = :black
    false
  end
end

Practical Examples

Graph traversal solves numerous real-world problems across different domains. Web crawlers use BFS to systematically discover and index web pages starting from seed URLs:

require 'set'
require 'uri'

class WebCrawler
  def initialize(max_depth: 3)
    @max_depth = max_depth
    @visited = Set.new
  end
  
  def crawl(start_url)
    queue = [[start_url, 0]]
    @visited.add(start_url)
    results = []
    
    until queue.empty?
      url, depth = queue.shift
      next if depth > @max_depth
      
      page_data = fetch_page(url)
      results << page_data
      
      extract_links(page_data[:content]).each do |link|
        normalized = normalize_url(link, url)
        unless @visited.include?(normalized)
          @visited.add(normalized)
          queue.push([normalized, depth + 1])
        end
      end
    end
    
    results
  end
  
  private
  
  def fetch_page(url)
    { url: url, content: "page content", links: [] }
  end
  
  def extract_links(content)
    []
  end
  
  def normalize_url(link, base)
    URI.join(base, link).to_s
  rescue
    nil
  end
end

Social networks determine relationship distances and suggest connections using BFS to compute degrees of separation:

class SocialNetwork
  def initialize
    @friendships = Hash.new { |h, k| h[k] = Set.new }
  end
  
  def add_friendship(user1, user2)
    @friendships[user1].add(user2)
    @friendships[user2].add(user1)
  end
  
  def degrees_of_separation(user1, user2)
    return 0 if user1 == user2
    
    visited = Set.new([user1])
    queue = [[user1, 0]]
    
    until queue.empty?
      user, distance = queue.shift
      
      @friendships[user].each do |friend|
        return distance + 1 if friend == user2
        
        unless visited.include?(friend)
          visited.add(friend)
          queue.push([friend, distance + 1])
        end
      end
    end
    
    Float::INFINITY
  end
  
  def suggest_friends(user, max_distance: 2)
    suggestions = Set.new
    visited = Set.new([user])
    queue = [[user, 0]]
    
    until queue.empty?
      current, distance = queue.shift
      next if distance >= max_distance
      
      @friendships[current].each do |friend|
        unless visited.include?(friend)
          visited.add(friend)
          suggestions.add(friend) if distance > 0
          queue.push([friend, distance + 1])
        end
      end
    end
    
    suggestions
  end
end

File system traversal uses DFS to recursively explore directory structures:

class FileSystemTraversal
  def initialize
    @results = []
  end
  
  def find_files(directory, pattern: nil, max_depth: nil)
    @results = []
    dfs_traverse(directory, pattern, 0, max_depth)
    @results
  end
  
  private
  
  def dfs_traverse(path, pattern, depth, max_depth)
    return if max_depth && depth > max_depth
    return unless File.directory?(path)
    
    Dir.entries(path).each do |entry|
      next if entry == '.' || entry == '..'
      
      full_path = File.join(path, entry)
      
      if File.directory?(full_path)
        dfs_traverse(full_path, pattern, depth + 1, max_depth)
      elsif pattern.nil? || entry.match?(pattern)
        @results << full_path
      end
    end
  rescue Errno::EACCES
    # Skip directories without permission
  end
end

# Usage
traversal = FileSystemTraversal.new
ruby_files = traversal.find_files('/project', pattern: /\.rb$/, max_depth: 5)

Dependency resolution for build systems and package managers employs DFS to determine installation order through topological sorting:

class DependencyResolver
  def initialize
    @dependencies = Hash.new { |h, k| h[k] = [] }
  end
  
  def add_dependency(package, depends_on)
    @dependencies[package] << depends_on
  end
  
  def installation_order(package)
    visited = Set.new
    stack = []
    
    return nil unless topological_sort(package, visited, Set.new, stack)
    
    stack.reverse
  end
  
  private
  
  def topological_sort(package, visited, current_path, stack)
    return true if visited.include?(package)
    return false if current_path.include?(package)  # Cycle detected
    
    current_path.add(package)
    
    @dependencies[package].each do |dependency|
      unless topological_sort(dependency, visited, current_path, stack)
        return false
      end
    end
    
    current_path.delete(package)
    visited.add(package)
    stack.push(package)
    true
  end
end

Performance Considerations

BFS and DFS exhibit identical time complexity but different space complexity characteristics. Both algorithms visit each vertex once and examine each edge once, resulting in O(V + E) time complexity where V represents vertices and E represents edges. This linear time relationship holds for both adjacency list and adjacency matrix representations, though matrix representations incur higher constant factors.

BFS space complexity reaches O(V) in the worst case when the queue stores an entire level of the graph. The maximum queue size occurs at the widest level, which in a complete binary tree contains roughly V/2 nodes. For graphs with high branching factors like social networks where nodes have hundreds of connections, BFS memory consumption becomes substantial.

# BFS memory usage grows with breadth
# Graph with branching factor b and depth d
# Maximum queue size: b^d nodes at deepest level
#
# Example: Social network with average 100 friends
# Depth 3: 100^3 = 1,000,000 nodes in queue

DFS space complexity reaches O(V) in the worst case but typically uses O(h) where h represents maximum path length from root to any node. For balanced trees and sparse graphs, DFS uses significantly less memory than BFS. However, in pathological cases like linked lists or highly unbalanced trees, both algorithms converge to O(V) space usage.

# DFS memory usage grows with depth
# Recursive DFS: stack frames = path length
# Iterative DFS: explicit stack = path length + siblings
#
# Example: Binary tree with depth d
# DFS stack size: d frames (recursive) or d + log(V) nodes (iterative)

The visited set dominates memory usage for both algorithms in large graphs, requiring O(V) space regardless of traversal strategy. Hash-based sets provide O(1) lookup but consume more memory than bitmap representations for dense integer-keyed graphs.

Cache performance differs between algorithms. BFS exhibits poor cache locality because it jumps between distant graph regions as it expands each level. DFS maintains better cache locality by following edges deeper into related subgraphs before backtracking. This difference affects performance significantly for large graphs exceeding CPU cache capacity.

require 'benchmark'

# Performance comparison on large graph
graph = Graph.new

# Create graph with 10000 nodes
10000.times do |i|
  rand(1..10).times do
    graph.add_edge(i, rand(10000))
  end
end

Benchmark.bm(15) do |x|
  x.report("BFS:") { graph.bfs(0) }
  x.report("DFS iterative:") { graph.dfs_iterative(0) }
  x.report("DFS recursive:") { graph.dfs_recursive(0) }
end

Recursive DFS risks stack overflow for deep graphs exceeding the language runtime's stack size limit. Ruby's default stack size typically handles thousands of recursive calls, but graphs with depth exceeding this limit crash. Iterative implementations avoid this limitation at the cost of explicit stack management.

For weighted graphs requiring shortest paths, neither BFS nor DFS suffices. Dijkstra's algorithm or A* search provide correct results at higher computational cost. BFS finds shortest paths only when all edges have equal weight.

Common Patterns

Visited tracking prevents infinite loops and duplicate processing. Three approaches exist: boolean sets, color marking, and timestamp tracking. Boolean sets suffice for simple traversal, while color marking (white/gray/black) enables cycle detection and classification of edge types.

# Boolean visited set (simple traversal)
visited = Set.new

# Color marking (cycle detection)
color = Hash.new(:white)  # :white, :gray, :black

# Timestamp tracking (edge classification)
discovery = {}
finish = {}
time = 0

Iterative deepening DFS combines DFS's memory efficiency with BFS's optimality for shortest paths. The algorithm performs repeated DFS searches with increasing depth limits, guaranteeing to find the shortest path while using only O(d) memory where d represents solution depth.

def iterative_deepening_dfs(graph, start, target)
  depth = 0
  
  loop do
    result = depth_limited_dfs(graph, start, target, depth)
    return result if result
    
    depth += 1
    break if depth > graph.size
  end
  
  nil
end

def depth_limited_dfs(graph, node, target, limit, visited = Set.new)
  return [node] if node == target
  return nil if limit <= 0
  
  visited.add(node)
  
  graph.neighbors(node).each do |neighbor|
    next if visited.include?(neighbor)
    
    path = depth_limited_dfs(graph, neighbor, target, limit - 1, visited)
    return [node] + path if path
  end
  
  visited.delete(node)
  nil
end

Bidirectional search runs simultaneous BFS from both start and target nodes, terminating when the frontiers intersect. This technique reduces search space from O(b^d) to O(b^(d/2)) for branching factor b and depth d.

def bidirectional_search(graph, start, target)
  return [start] if start == target
  
  forward_queue = [start]
  backward_queue = [target]
  forward_visited = { start => nil }
  backward_visited = { target => nil }
  
  until forward_queue.empty? || backward_queue.empty?
    # Expand smaller frontier
    if forward_queue.size <= backward_queue.size
      node = forward_queue.shift
      
      graph.neighbors(node).each do |neighbor|
        if backward_visited.key?(neighbor)
          return build_path(forward_visited, backward_visited, neighbor)
        end
        
        unless forward_visited.key?(neighbor)
          forward_visited[neighbor] = node
          forward_queue.push(neighbor)
        end
      end
    else
      node = backward_queue.shift
      
      graph.neighbors(node).each do |neighbor|
        if forward_visited.key?(neighbor)
          return build_path(forward_visited, backward_visited, neighbor)
        end
        
        unless backward_visited.key?(neighbor)
          backward_visited[neighbor] = node
          backward_queue.push(neighbor)
        end
      end
    end
  end
  
  nil
end

Parent tracking during traversal enables path reconstruction and tree construction. Each discovered node records its predecessor, allowing path extraction through backward traversal.

# Track parents during traversal
parent = { start => nil }

# Later reconstruct path
def build_path(parent, start, target)
  path = []
  current = target
  
  while current
    path.unshift(current)
    current = parent[current]
  end
  
  path
end

Level tracking during BFS groups nodes by distance from the start, enabling level-order processing or computing exact distances.

def bfs_with_levels(graph, start)
  visited = Set.new([start])
  queue = [[start, 0]]
  levels = Hash.new { |h, k| h[k] = [] }
  
  until queue.empty?
    node, level = queue.shift
    levels[level] << node
    
    graph.neighbors(node).each do |neighbor|
      unless visited.include?(neighbor)
        visited.add(neighbor)
        queue.push([neighbor, level + 1])
      end
    end
  end
  
  levels
end

Multiple component traversal processes disconnected graphs by initiating traversal from each unvisited node, identifying all connected components.

def connected_components(graph)
  visited = Set.new
  components = []
  
  graph.nodes.each do |node|
    next if visited.include?(node)
    
    component = Set.new
    dfs(graph, node, visited, component)
    components << component
  end
  
  components
end

def dfs(graph, node, visited, component)
  return if visited.include?(node)
  
  visited.add(node)
  component.add(node)
  
  graph.neighbors(node).each do |neighbor|
    dfs(graph, neighbor, visited, component)
  end
end

Reference

Algorithm Comparison

Characteristic BFS DFS
Data Structure Queue (FIFO) Stack or recursion (LIFO)
Time Complexity O(V + E) O(V + E)
Space Complexity O(V) worst case, O(w) typical O(V) worst case, O(h) typical
Shortest Path Finds shortest path in unweighted graphs Does not guarantee shortest path
Memory Usage High for wide graphs High for deep graphs
Implementation Iterative with queue Recursive or iterative with stack
Use Case Shortest paths, level-order Topological sort, cycle detection

Time Complexity by Graph Representation

Representation BFS DFS Space
Adjacency List O(V + E) O(V + E) O(V + E)
Adjacency Matrix O(V²) O(V²) O(V²)
Edge List O(V * E) O(V * E) O(E)

Space Complexity Components

Component BFS DFS Recursive DFS Iterative
Visited Set O(V) O(V) O(V)
Frontier O(V) O(h) stack frames O(h) explicit stack
Total O(V) O(V + h) O(V + h)

Common Applications

Problem Algorithm Reason
Shortest path in unweighted graph BFS Level-by-level guarantees minimum distance
Path existence Either Both work equally well
All paths enumeration DFS Backtracking naturally enumerates paths
Topological sorting DFS Finish times determine ordering
Cycle detection DFS Gray nodes identify back edges
Connected components Either Both identify components
Strongly connected components DFS Required for Tarjan's and Kosaraju's algorithms
Bipartite checking BFS Level tracking enables color assignment
Maze solving DFS Memory efficient for deep mazes
Web crawling BFS Prioritizes pages closer to seed URLs

Graph Traversal Methods

Method Return Type Description
bfs(start) Set Returns all nodes reachable from start using BFS
dfs_iterative(start) Set Returns all nodes reachable from start using iterative DFS
dfs_recursive(start) Set Returns all nodes reachable from start using recursive DFS
shortest_path(start, target) Array or nil Returns shortest path between nodes or nil if unreachable
has_cycle? Boolean Detects cycle presence in directed graph

Edge Types in DFS

Edge Type Definition Detection
Tree Edge Part of DFS spanning tree Leads to unvisited (white) node
Back Edge Creates cycle Leads to ancestor (gray) node
Forward Edge Shortcut in tree Leads to descendant (black) node
Cross Edge Between subtrees Leads to unrelated (black) node

Performance Characteristics

Scenario Preferred Algorithm Space Usage Notes
Wide shallow graph DFS O(h) where h is small Queue would be very large
Deep narrow graph DFS O(h) where h is large Still better than BFS queue
Need shortest path BFS O(w) where w is width Required for correctness
Path enumeration DFS O(h) Natural backtracking
Memory constrained DFS iterative O(h) Bounded stack depth
Stack overflow risk BFS O(w) No recursion depth limit