CrackedRuby - All Pairs Shortest Path (Floyd-Warshall)

Overview

The Floyd-Warshall algorithm computes the shortest paths between all pairs of vertices in a weighted, directed graph. Unlike single-source shortest path algorithms that find distances from one vertex to all others, Floyd-Warshall produces a complete distance matrix showing the shortest path between every possible pair of vertices in a single execution.

The algorithm operates through dynamic programming, building solutions incrementally by considering increasingly larger sets of intermediate vertices. For a graph with V vertices, the algorithm runs in O(V³) time and uses O(V²) space to store the distance matrix. This cubic time complexity makes Floyd-Warshall practical for dense graphs with hundreds of vertices but prohibitive for very large sparse graphs where repeated Dijkstra or Bellman-Ford executions might be more efficient.

Floyd-Warshall handles negative edge weights correctly, detecting negative cycles when they exist. The algorithm reports when a graph contains a negative cycle by checking for negative values on the diagonal of the distance matrix after completion. This capability distinguishes Floyd-Warshall from Dijkstra's algorithm, which cannot handle negative weights.

The algorithm finds application in network routing protocols, analyzing social networks, computing transitive closures of relations, and solving optimization problems requiring all-pairs distances. Transportation networks, communication systems, and graph analysis tools frequently employ Floyd-Warshall when complete distance information is required.

# Basic distance matrix after Floyd-Warshall
# Graph: 0 --5--> 1 --3--> 2
#        |         ^
#        +---2-----+
distances = [
  [0, 5, 8],    # From vertex 0: to 0=0, to 1=5, to 2=8
  [Float::INFINITY, 0, 3],  # From vertex 1
  [Float::INFINITY, Float::INFINITY, 0]  # From vertex 2
]

# After algorithm completes, distances[i][j] contains
# the shortest path distance from vertex i to vertex j

Key Principles

Floyd-Warshall builds on a simple recursive structure: the shortest path from vertex i to vertex j either goes through an intermediate vertex k, or it does not. If the path goes through k, the distance equals the shortest path from i to k plus the shortest path from k to j. The algorithm systematically considers each vertex as a potential intermediate point, updating distances when a shorter path is found.

The algorithm maintains a distance matrix D where D[i][j] represents the shortest known distance from vertex i to vertex j using only vertices {0, 1, ..., k} as intermediate points. Initially, D[i][j] equals the direct edge weight if an edge exists from i to j, infinity if no edge exists, and zero for D[i][i]. The algorithm then iterates through each vertex k, updating D[i][j] to min(D[i][j], D[i][k] + D[k][j]) for all pairs i, j.

The dynamic programming recurrence relation is:

D[i][j]^(k) = min(D[i][j]^(k-1), D[i][k]^(k-1) + D[k][j]^(k-1))

Where D[i][j]^(k) represents the shortest distance from i to j using vertices {0, 1, ..., k} as intermediates. The base case D[i][j]^(0) uses only direct edges, and the final solution D[i][j]^(V-1) allows all vertices as intermediates.

Path reconstruction requires maintaining a separate predecessor matrix P where P[i][j] stores the highest-indexed intermediate vertex on the shortest path from i to j. When D[i][j] is updated via vertex k, set P[i][j] = k. After completion, the actual path from i to j can be recovered by recursively following predecessors.

Negative cycle detection relies on the principle that after V iterations, if any diagonal element D[i][i] becomes negative, the graph contains a negative cycle reachable from vertex i. A vertex on its own shortest path with negative distance indicates a cycle that reduces total path weight.

# The recurrence in action
# Current best distance from 0 to 2: 10
# Considering vertex 1 as intermediate
# Distance 0→1: 3, Distance 1→2: 4
# New distance via 1: 3 + 4 = 7

# Update occurs because 7 < 10
dist[0][2] = [dist[0][2], dist[0][1] + dist[1][2]].min
# => 7

The algorithm's correctness stems from optimal substructure: if the shortest path from i to j goes through k, then the subpaths from i to k and k to j must also be shortest paths. This property allows the algorithm to build globally optimal solutions from locally optimal subproblems.

Ruby Implementation

Floyd-Warshall in Ruby operates on a two-dimensional array representing the adjacency matrix. The implementation initializes the distance matrix with edge weights, then performs three nested iterations to consider all vertex triples.

class FloydWarshall
  def initialize(graph)
    @graph = graph
    @vertex_count = graph.length
    @dist = deep_copy(graph)
    @next = Array.new(@vertex_count) { Array.new(@vertex_count) }
    initialize_next_matrix
  end

  def shortest_paths
    @vertex_count.times do |k|
      @vertex_count.times do |i|
        @vertex_count.times do |j|
          if @dist[i][k] + @dist[k][j] < @dist[i][j]
            @dist[i][j] = @dist[i][k] + @dist[k][j]
            @next[i][j] = @next[i][k]
          end
        end
      end
    end

    check_negative_cycles
    @dist
  end

  def reconstruct_path(start, finish)
    return nil if @next[start][finish].nil?
    return [start] if start == finish

    path = [start]
    current = start

    while current != finish
      current = @next[current][finish]
      return nil if current.nil?
      path << current
    end

    path
  end

  private

  def initialize_next_matrix
    @vertex_count.times do |i|
      @vertex_count.times do |j|
        @next[i][j] = j if @graph[i][j] != Float::INFINITY
      end
    end
  end

  def check_negative_cycles
    @vertex_count.times do |i|
      if @dist[i][i] < 0
        raise "Negative cycle detected at vertex #{i}"
      end
    end
  end

  def deep_copy(matrix)
    matrix.map(&:dup)
  end
end

Graph representation uses Float::INFINITY for non-existent edges, enabling proper min comparisons. The adjacency matrix must be square with dimensions V×V where V is the vertex count. Self-loops set D[i][i] = 0 initially unless the graph explicitly contains negative self-loops.

# Graph construction
# 4 vertices with weighted edges
INF = Float::INFINITY

graph = [
  [0,   3,   INF, 7  ],
  [8,   0,   2,   INF],
  [5,   INF, 0,   1  ],
  [2,   INF, INF, 0  ]
]

fw = FloydWarshall.new(graph)
distances = fw.shortest_paths

# distances[0][3] contains shortest distance from vertex 0 to 3
# => 6 (path: 0→1→2→3 with weights 3+2+1)

path = fw.reconstruct_path(0, 3)
# => [0, 1, 2, 3]

Space optimization is possible for applications requiring only final distances without path reconstruction. The algorithm can operate in-place on the distance matrix, eliminating the need for separate current and previous iteration matrices since updates depend only on values from the current iteration.

def shortest_paths_space_optimized(graph)
  dist = graph.map(&:dup)
  v = graph.length

  v.times do |k|
    v.times do |i|
      v.times do |j|
        new_dist = dist[i][k] + dist[k][j]
        dist[i][j] = new_dist if new_dist < dist[i][j]
      end
    end
  end

  dist
end

Parallel processing can optimize Floyd-Warshall by distributing the inner loops across threads or processes. However, the k-loop must execute sequentially since iteration k depends on all updates from iteration k-1. Ruby's Global Interpreter Lock limits thread parallelism, but process-based parallelism through tools like Parallel gem can achieve speedups on multi-core systems.

require 'parallel'

def parallel_floyd_warshall(graph, num_processes: 4)
  dist = graph.map(&:dup)
  v = graph.length

  v.times do |k|
    # Parallelize over i values
    results = Parallel.map(0...v, in_processes: num_processes) do |i|
      row_updates = []
      v.times do |j|
        new_dist = dist[i][k] + dist[k][j]
        row_updates << [i, j, new_dist] if new_dist < dist[i][j]
      end
      row_updates
    end

    # Apply updates sequentially
    results.flatten(1).each do |i, j, new_dist|
      dist[i][j] = new_dist
    end
  end

  dist
end

Practical Examples

Network routing protocols use Floyd-Warshall to compute routing tables when all-pairs connectivity information is needed. A router running the algorithm on its network topology can determine optimal next-hop destinations for packets heading to any destination.

class NetworkRouter
  def initialize(network_graph)
    @network = network_graph
    @vertex_count = network_graph.length
    compute_routing_table
  end

  def compute_routing_table
    @distances = @network.map(&:dup)
    @next_hop = Array.new(@vertex_count) { Array.new(@vertex_count) }

    # Initialize next hops
    @vertex_count.times do |i|
      @vertex_count.times do |j|
        @next_hop[i][j] = j if @network[i][j] != Float::INFINITY && i != j
      end
    end

    # Floyd-Warshall
    @vertex_count.times do |k|
      @vertex_count.times do |i|
        @vertex_count.times do |j|
          if @distances[i][k] + @distances[k][j] < @distances[i][j]
            @distances[i][j] = @distances[i][k] + @distances[k][j]
            @next_hop[i][j] = @next_hop[i][k]
          end
        end
      end
    end
  end

  def route_packet(source, destination)
    return nil if @next_hop[source][destination].nil?

    {
      total_cost: @distances[source][destination],
      next_hop: @next_hop[source][destination],
      full_path: reconstruct_path(source, destination)
    }
  end

  private

  def reconstruct_path(start, finish)
    return nil if @next_hop[start][finish].nil?

    path = [start]
    current = start

    while current != finish
      current = @next_hop[current][finish]
      path << current
    end

    path
  end
end

# 5-router network
INF = Float::INFINITY
network = [
  [0,   4,   INF, INF, 8  ],
  [4,   0,   8,   INF, INF],
  [INF, 8,   0,   2,   INF],
  [INF, INF, 2,   0,   9  ],
  [8,   INF, INF, 9,   0  ]
]

router = NetworkRouter.new(network)
route = router.route_packet(0, 3)
# => {
#   total_cost: 14,
#   next_hop: 1,
#   full_path: [0, 1, 2, 3]
# }

Analyzing social network closeness centrality requires computing distances between all user pairs. Floyd-Warshall provides the complete distance matrix needed to calculate how close each user is to all others in the network.

class SocialNetworkAnalyzer
  def initialize(adjacency_matrix)
    @graph = adjacency_matrix
    @vertex_count = adjacency_matrix.length
    @distances = compute_all_pairs_distances
  end

  def closeness_centrality(vertex)
    total_distance = 0
    reachable_count = 0

    @vertex_count.times do |i|
      next if i == vertex

      if @distances[vertex][i] != Float::INFINITY
        total_distance += @distances[vertex][i]
        reachable_count += 1
      end
    end

    return 0 if reachable_count == 0
    reachable_count.to_f / total_distance
  end

  def average_path_length
    total = 0
    count = 0

    @vertex_count.times do |i|
      @vertex_count.times do |j|
        next if i == j
        if @distances[i][j] != Float::INFINITY
          total += @distances[i][j]
          count += 1
        end
      end
    end

    count > 0 ? total.to_f / count : Float::INFINITY
  end

  def network_diameter
    max_distance = 0

    @vertex_count.times do |i|
      @vertex_count.times do |j|
        next if i == j
        if @distances[i][j] != Float::INFINITY && @distances[i][j] > max_distance
          max_distance = @distances[i][j]
        end
      end
    end

    max_distance
  end

  private

  def compute_all_pairs_distances
    dist = @graph.map(&:dup)
    v = @vertex_count

    v.times do |k|
      v.times do |i|
        v.times do |j|
          dist[i][j] = [dist[i][j], dist[i][k] + dist[k][j]].min
        end
      end
    end

    dist
  end
end

# Friendship network (unweighted, so all edges = 1)
INF = Float::INFINITY
friends = [
  [0, 1,   1,   INF, INF],
  [1, 0,   1,   1,   INF],
  [1, 1,   0,   INF, 1  ],
  [INF, 1, INF, 0,   1  ],
  [INF, INF, 1, 1,   0  ]
]

analyzer = SocialNetworkAnalyzer.new(friends)
analyzer.closeness_centrality(1)  # => 1.0 (most central user)
analyzer.network_diameter           # => 3
analyzer.average_path_length        # => 1.6

Transitive closure computation determines reachability between all vertex pairs. Floyd-Warshall adaptation replaces distance calculations with boolean reachability, producing a matrix indicating which vertices can reach which others.

def transitive_closure(graph)
  v = graph.length
  reach = Array.new(v) { Array.new(v, false) }

  # Initialize: direct edges and self-loops
  v.times do |i|
    v.times do |j|
      reach[i][j] = true if i == j || graph[i][j] == 1
    end
  end

  # Floyd-Warshall for reachability
  v.times do |k|
    v.times do |i|
      v.times do |j|
        reach[i][j] = reach[i][j] || (reach[i][k] && reach[k][j])
      end
    end
  end

  reach
end

# Directed graph (1 = edge exists, 0 = no edge)
graph = [
  [0, 1, 0, 0],
  [0, 0, 1, 0],
  [0, 0, 0, 1],
  [1, 0, 0, 0]
]

closure = transitive_closure(graph)
# closure[0][3] => true (can reach 3 from 0 via 0→1→2→3)
# All entries true (every vertex reaches every other - it's a cycle)

Design Considerations

Floyd-Warshall suits scenarios requiring complete all-pairs distance information, particularly when the graph is dense or when multiple shortest path queries will be made. The algorithm performs V³ operations regardless of edge count, making it efficient for dense graphs where E approaches V² but wasteful for sparse graphs.

For sparse graphs with E << V², running Dijkstra's algorithm V times produces better performance. Dijkstra with a binary heap runs in O(V² log V) per execution, giving O(V³ log V) total for all pairs. With a Fibonacci heap, single-source Dijkstra achieves O(E + V log V), yielding O(VE + V² log V) for all sources. When E = O(V), this becomes O(V²), significantly better than Floyd-Warshall's O(V³).

# Decision matrix
def choose_algorithm(vertices, edges)
  density = edges.to_f / (vertices * (vertices - 1))

  if density > 0.5
    # Dense graph
    :floyd_warshall
  elsif negative_weights_present?
    # Sparse with negative weights
    :bellman_ford_all_pairs  # V * O(VE) = O(V²E)
  else
    # Sparse without negative weights
    :dijkstra_all_pairs  # V * O(E log V) with binary heap
  end
end

Negative edge weights eliminate Dijkstra as an option, leaving Floyd-Warshall or V executions of Bellman-Ford. Bellman-Ford runs in O(VE) per source, giving O(V²E) total. For dense graphs where E ≈ V², this equals O(V⁴), worse than Floyd-Warshall's O(V³). Floyd-Warshall becomes the clear choice when negative weights exist in dense graphs.

Memory constraints favor Floyd-Warshall when path reconstruction is not required. The algorithm needs only O(V²) space for the distance matrix, whereas running V instances of Dijkstra or Bellman-Ford might require tracking separate data structures for each source. Path reconstruction adds O(V²) space for the next-hop matrix, but this remains constant regardless of edge count.

Dynamic graphs with frequent edge updates create challenges for Floyd-Warshall. Adding or removing a single edge requires recomputing the entire distance matrix in O(V³) time. Incremental shortest path algorithms handle updates more efficiently, though they increase complexity. For mostly static graphs with occasional batch updates, precomputing with Floyd-Warshall and caching results often proves more practical than maintaining incremental data structures.

class CachedFloydWarshall
  def initialize(graph)
    @graph = graph
    @cache_valid = false
    @distances = nil
  end

  def shortest_distance(i, j)
    recompute unless @cache_valid
    @distances[i][j]
  end

  def update_edge(from, to, new_weight)
    @graph[from][to] = new_weight
    @cache_valid = false
  end

  def batch_update
    yield self
    recompute
  end

  private

  def recompute
    @distances = floyd_warshall(@graph)
    @cache_valid = true
  end
end

Performance Considerations

The O(V³) time complexity stems from three nested loops, each iterating V times. This cubic growth limits Floyd-Warshall to graphs with hundreds or low thousands of vertices on modern hardware. A graph with 1,000 vertices requires one billion operations, taking seconds on typical processors. A graph with 10,000 vertices requires one trillion operations, pushing into minutes or hours.

Space complexity of O(V²) for the distance matrix becomes prohibitive as vertex count grows. A 10,000-vertex graph needs 100 million matrix entries. Using 8-byte floating-point numbers, this consumes 800MB of memory. Add path reconstruction with another 100 million integers at 8 bytes each, and memory usage reaches 1.6GB for a single graph.

Cache efficiency impacts actual performance significantly. The algorithm's memory access pattern benefits from good spatial locality when accessing the distance matrix. Processing by rows (the i-loop) maintains sequential memory access, improving cache hit rates. Transposing the loops to process by columns can degrade performance due to cache misses on large matrices.

require 'benchmark'

def benchmark_floyd_warshall(vertex_count)
  graph = Array.new(vertex_count) { Array.new(vertex_count, Float::INFINITY) }

  # Initialize with random edges
  vertex_count.times do |i|
    graph[i][i] = 0
    (vertex_count / 2).times do
      j = rand(vertex_count)
      graph[i][j] = rand(1..10) if i != j
    end
  end

  Benchmark.measure do
    shortest_paths_space_optimized(graph)
  end
end

# Timing results (example output)
# 100 vertices:  ~0.05 seconds
# 200 vertices:  ~0.4 seconds
# 500 vertices:  ~6 seconds
# 1000 vertices: ~50 seconds

Loop unrolling provides minor optimizations by reducing loop overhead and enabling better instruction pipelining. However, the massive number of operations overwhelms these micro-optimizations. The fundamental O(V³) complexity dominates performance characteristics.

Early termination optimization applies when detecting negative cycles is the primary goal. After finding a negative diagonal entry, the algorithm can immediately abort since the graph contains a negative cycle. This saves computation when negative cycle detection fails early but provides no benefit when checking completes successfully.

def detect_negative_cycle_fast(graph)
  dist = graph.map(&:dup)
  v = graph.length

  v.times do |k|
    v.times do |i|
      # Check diagonal after each k iteration
      return true if dist[i][i] < 0

      v.times do |j|
        dist[i][j] = [dist[i][j], dist[i][k] + dist[k][j]].min
      end
    end
  end

  v.times { |i| return true if dist[i][i] < 0 }
  false
end

Blocked matrix algorithms improve cache utilization by dividing the distance matrix into square blocks that fit in cache. Processing blocks sequentially reduces cache misses, improving performance on large matrices. Implementation complexity increases substantially, and benefits materialize only for matrices large enough that cache misses dominate computation time.

GPU acceleration achieves significant speedups for Floyd-Warshall since the algorithm has high arithmetic intensity and parallel structure. The inner two loops can execute simultaneously across thousands of GPU cores. However, the k-loop must execute sequentially, limiting parallelism to V² parallel operations per k iteration. GPU implementations using CUDA or OpenCL achieve 10-100x speedups on large graphs compared to CPU implementations.

Common Pitfalls

Incorrect infinity initialization causes the algorithm to fail. Setting non-existent edges to large finite values like 999999 instead of Float::INFINITY leads to incorrect shortest paths when the algorithm finds "paths" through non-existent edges. The min operation compares these large values, treating them as real paths.

# WRONG - using large number instead of infinity
graph = Array.new(5) { Array.new(5, 999999) }
5.times { |i| graph[i][i] = 0 }

# After Floyd-Warshall, spurious paths appear
# because 999999 + 999999 < 999999 evaluates to false
# but the algorithm treats 999999 as a real distance

# CORRECT - using Float::INFINITY
graph = Array.new(5) { Array.new(5, Float::INFINITY) }
5.times { |i| graph[i][i] = 0 }

Forgetting to initialize diagonal elements to zero creates nonsensical results where the shortest path from a vertex to itself appears to be non-zero. The algorithm then propagates these incorrect self-loop distances throughout the matrix, corrupting all path calculations.

Integer overflow occurs when edge weights or accumulated distances exceed integer maximum values. Using 32-bit integers with large weights causes overflow during addition, wrapping to negative values and producing incorrect results. Float types avoid overflow but introduce precision errors for very large or very small values.

# Integer overflow example
MAX_INT = 2**31 - 1
large_weight = MAX_INT / 2

# This addition overflows in 32-bit integers
# MAX_INT / 2 + MAX_INT / 2 wraps to negative

# Solution: use Float or check for overflow
def safe_add(a, b)
  return Float::INFINITY if a == Float::INFINITY || b == Float::INFINITY
  sum = a + b
  sum
end

Modifying the graph during execution corrupts the algorithm. The distance matrix represents work in progress during iterations. Changing edge weights or the graph structure while the algorithm runs leads to undefined behavior since the algorithm assumes static input throughout execution.

Negative cycle misinterpretation happens when developers check for negative cycles after initialization but before running the algorithm. Negative weights on individual edges do not constitute negative cycles. Only after Floyd-Warshall completes can negative diagonal entries indicate negative cycles reachable from those vertices.

Path reconstruction errors occur when updating the distance matrix without correspondingly updating the predecessor matrix. Each distance update via intermediate vertex k must also update the next-hop to point to the same intermediate vertex's next-hop toward the destination.

# WRONG - updating distance without next-hop
if dist[i][k] + dist[k][j] < dist[i][j]
  dist[i][j] = dist[i][k] + dist[k][j]
  # Forgot to update next_hop[i][j]
end

# CORRECT - maintaining consistency
if dist[i][k] + dist[k][j] < dist[i][j]
  dist[i][j] = dist[i][k] + dist[k][j]
  next_hop[i][j] = next_hop[i][k]
end

Directed versus undirected graph confusion leads to errors when representing undirected graphs. An undirected edge between i and j requires two matrix entries: graph[i][j] and graph[j][i], both with the same weight. Forgetting to set both entries creates an asymmetric matrix representing a directed graph instead.

Memory aliasing bugs appear when shallow-copying the input graph. Modifying the distance matrix during the algorithm should not affect the original graph, but shallow copying causes both to reference the same inner arrays. Deep copying with map(&:dup) creates independent arrays.

# WRONG - shallow copy
dist = graph.map { |row| row }
# Modifying dist[i][j] also modifies graph[i][j]

# CORRECT - deep copy
dist = graph.map(&:dup)
# dist and graph are now independent

Reference

Algorithm Complexity

Metric	Complexity	Notes
Time	O(V³)	Three nested loops over all vertices
Space	O(V²)	Distance matrix storage
Space with paths	O(V²)	Distance matrix plus next-hop matrix
Best case	O(V³)	No early termination possible
Worst case	O(V³)	Same as best case

Key Operations

Operation	Time	Description
Initialize matrix	O(V²)	Copy graph to distance matrix
Main algorithm	O(V³)	Triple nested loop
Negative cycle check	O(V)	Scan diagonal after completion
Path reconstruction	O(V)	Follow next-hop pointers
Single distance query	O(1)	Direct matrix lookup after computation

Distance Matrix Initialization

Matrix Entry	Initial Value	Meaning
D[i][i]	0	Distance from vertex to itself
D[i][j] edge exists	weight(i,j)	Direct edge weight
D[i][j] no edge	Float::INFINITY	No direct connection

Algorithm Comparison

Algorithm	Time per source	All pairs time	Handles negative weights	Space
Floyd-Warshall	N/A	O(V³)	Yes	O(V²)
Dijkstra all pairs	O(E + V log V)	O(VE + V² log V)	No	O(V) per source
Bellman-Ford all pairs	O(VE)	O(V²E)	Yes	O(V) per source
Johnson's algorithm	O(E log V)	O(VE + V² log V)	Yes	O(V²)

Use Case Selection

Scenario	Recommended Algorithm	Reason
Dense graph, all pairs needed	Floyd-Warshall	O(V³) optimal for dense graphs
Sparse graph, no negative weights	Dijkstra from each vertex	O(VE log V) better for sparse
Sparse graph, negative weights	Bellman-Ford from each vertex	Handles negatives, O(V²E) acceptable
Negative weights, very sparse	Johnson's algorithm	Best for sparse with negatives
Single source needed	Dijkstra or Bellman-Ford	No need for all pairs
Only reachability needed	Transitive closure variant	Boolean operations faster

Common Graph Representations

Representation	Initialization	Space	Best for
Adjacency matrix	Direct use	O(V²)	Dense graphs
Adjacency list	Convert to matrix	O(E) input, O(V²) working	Sparse graphs
Edge list	Convert to matrix	O(E) input, O(V²) working	Very sparse graphs

Negative Cycle Detection

Check	When	Result Interpretation
After initialization	Before algorithm	Meaningless - negative edges not cycles
After each k iteration	During execution	Partial - may not detect all cycles
After completion	After all iterations	D[i][i] < 0 indicates negative cycle through i

Path Reconstruction Steps

Step	Operation	Data Structure
1. Initialize	Set next[i][j] = j for all edges	Next-hop matrix
2. Update	next[i][j] = next[i][k] when updating distance	Maintain consistency
3. Reconstruct	Follow next[start][dest] until reaching dest	Build path list
4. Check validity	Verify next[i][j] not null	Handle unreachable pairs

All Pairs Shortest Path (Floyd-Warshall)