Overview
The Floyd-Warshall algorithm computes the shortest paths between all pairs of vertices in a weighted, directed graph. Unlike single-source shortest path algorithms that find distances from one vertex to all others, Floyd-Warshall produces a complete distance matrix showing the shortest path between every possible pair of vertices in a single execution.
The algorithm operates through dynamic programming, building solutions incrementally by considering increasingly larger sets of intermediate vertices. For a graph with V vertices, the algorithm runs in O(V³) time and uses O(V²) space to store the distance matrix. This cubic time complexity makes Floyd-Warshall practical for dense graphs with hundreds of vertices but prohibitive for very large sparse graphs where repeated Dijkstra or Bellman-Ford executions might be more efficient.
Floyd-Warshall handles negative edge weights correctly, detecting negative cycles when they exist. The algorithm reports when a graph contains a negative cycle by checking for negative values on the diagonal of the distance matrix after completion. This capability distinguishes Floyd-Warshall from Dijkstra's algorithm, which cannot handle negative weights.
The algorithm finds application in network routing protocols, analyzing social networks, computing transitive closures of relations, and solving optimization problems requiring all-pairs distances. Transportation networks, communication systems, and graph analysis tools frequently employ Floyd-Warshall when complete distance information is required.
# Basic distance matrix after Floyd-Warshall
# Graph: 0 --5--> 1 --3--> 2
# | ^
# +---2-----+
distances = [
[0, 5, 8], # From vertex 0: to 0=0, to 1=5, to 2=8
[Float::INFINITY, 0, 3], # From vertex 1
[Float::INFINITY, Float::INFINITY, 0] # From vertex 2
]
# After algorithm completes, distances[i][j] contains
# the shortest path distance from vertex i to vertex j
Key Principles
Floyd-Warshall builds on a simple recursive structure: the shortest path from vertex i to vertex j either goes through an intermediate vertex k, or it does not. If the path goes through k, the distance equals the shortest path from i to k plus the shortest path from k to j. The algorithm systematically considers each vertex as a potential intermediate point, updating distances when a shorter path is found.
The algorithm maintains a distance matrix D where D[i][j] represents the shortest known distance from vertex i to vertex j using only vertices {0, 1, ..., k} as intermediate points. Initially, D[i][j] equals the direct edge weight if an edge exists from i to j, infinity if no edge exists, and zero for D[i][i]. The algorithm then iterates through each vertex k, updating D[i][j] to min(D[i][j], D[i][k] + D[k][j]) for all pairs i, j.
The dynamic programming recurrence relation is:
D[i][j]^(k) = min(D[i][j]^(k-1), D[i][k]^(k-1) + D[k][j]^(k-1))
Where D[i][j]^(k) represents the shortest distance from i to j using vertices {0, 1, ..., k} as intermediates. The base case D[i][j]^(0) uses only direct edges, and the final solution D[i][j]^(V-1) allows all vertices as intermediates.
Path reconstruction requires maintaining a separate predecessor matrix P where P[i][j] stores the highest-indexed intermediate vertex on the shortest path from i to j. When D[i][j] is updated via vertex k, set P[i][j] = k. After completion, the actual path from i to j can be recovered by recursively following predecessors.
Negative cycle detection relies on the principle that after V iterations, if any diagonal element D[i][i] becomes negative, the graph contains a negative cycle reachable from vertex i. A vertex on its own shortest path with negative distance indicates a cycle that reduces total path weight.
# The recurrence in action
# Current best distance from 0 to 2: 10
# Considering vertex 1 as intermediate
# Distance 0→1: 3, Distance 1→2: 4
# New distance via 1: 3 + 4 = 7
# Update occurs because 7 < 10
dist[0][2] = [dist[0][2], dist[0][1] + dist[1][2]].min
# => 7
The algorithm's correctness stems from optimal substructure: if the shortest path from i to j goes through k, then the subpaths from i to k and k to j must also be shortest paths. This property allows the algorithm to build globally optimal solutions from locally optimal subproblems.
Ruby Implementation
Floyd-Warshall in Ruby operates on a two-dimensional array representing the adjacency matrix. The implementation initializes the distance matrix with edge weights, then performs three nested iterations to consider all vertex triples.
class FloydWarshall
def initialize(graph)
@graph = graph
@vertex_count = graph.length
@dist = deep_copy(graph)
@next = Array.new(@vertex_count) { Array.new(@vertex_count) }
initialize_next_matrix
end
def shortest_paths
@vertex_count.times do |k|
@vertex_count.times do |i|
@vertex_count.times do |j|
if @dist[i][k] + @dist[k][j] < @dist[i][j]
@dist[i][j] = @dist[i][k] + @dist[k][j]
@next[i][j] = @next[i][k]
end
end
end
end
check_negative_cycles
@dist
end
def reconstruct_path(start, finish)
return nil if @next[start][finish].nil?
return [start] if start == finish
path = [start]
current = start
while current != finish
current = @next[current][finish]
return nil if current.nil?
path << current
end
path
end
private
def initialize_next_matrix
@vertex_count.times do |i|
@vertex_count.times do |j|
@next[i][j] = j if @graph[i][j] != Float::INFINITY
end
end
end
def check_negative_cycles
@vertex_count.times do |i|
if @dist[i][i] < 0
raise "Negative cycle detected at vertex #{i}"
end
end
end
def deep_copy(matrix)
matrix.map(&:dup)
end
end
Graph representation uses Float::INFINITY for non-existent edges, enabling proper min comparisons. The adjacency matrix must be square with dimensions V×V where V is the vertex count. Self-loops set D[i][i] = 0 initially unless the graph explicitly contains negative self-loops.
# Graph construction
# 4 vertices with weighted edges
INF = Float::INFINITY
graph = [
[0, 3, INF, 7 ],
[8, 0, 2, INF],
[5, INF, 0, 1 ],
[2, INF, INF, 0 ]
]
fw = FloydWarshall.new(graph)
distances = fw.shortest_paths
# distances[0][3] contains shortest distance from vertex 0 to 3
# => 6 (path: 0→1→2→3 with weights 3+2+1)
path = fw.reconstruct_path(0, 3)
# => [0, 1, 2, 3]
Space optimization is possible for applications requiring only final distances without path reconstruction. The algorithm can operate in-place on the distance matrix, eliminating the need for separate current and previous iteration matrices since updates depend only on values from the current iteration.
def shortest_paths_space_optimized(graph)
dist = graph.map(&:dup)
v = graph.length
v.times do |k|
v.times do |i|
v.times do |j|
new_dist = dist[i][k] + dist[k][j]
dist[i][j] = new_dist if new_dist < dist[i][j]
end
end
end
dist
end
Parallel processing can optimize Floyd-Warshall by distributing the inner loops across threads or processes. However, the k-loop must execute sequentially since iteration k depends on all updates from iteration k-1. Ruby's Global Interpreter Lock limits thread parallelism, but process-based parallelism through tools like Parallel gem can achieve speedups on multi-core systems.
require 'parallel'
def parallel_floyd_warshall(graph, num_processes: 4)
dist = graph.map(&:dup)
v = graph.length
v.times do |k|
# Parallelize over i values
results = Parallel.map(0...v, in_processes: num_processes) do |i|
row_updates = []
v.times do |j|
new_dist = dist[i][k] + dist[k][j]
row_updates << [i, j, new_dist] if new_dist < dist[i][j]
end
row_updates
end
# Apply updates sequentially
results.flatten(1).each do |i, j, new_dist|
dist[i][j] = new_dist
end
end
dist
end
Practical Examples
Network routing protocols use Floyd-Warshall to compute routing tables when all-pairs connectivity information is needed. A router running the algorithm on its network topology can determine optimal next-hop destinations for packets heading to any destination.
class NetworkRouter
def initialize(network_graph)
@network = network_graph
@vertex_count = network_graph.length
compute_routing_table
end
def compute_routing_table
@distances = @network.map(&:dup)
@next_hop = Array.new(@vertex_count) { Array.new(@vertex_count) }
# Initialize next hops
@vertex_count.times do |i|
@vertex_count.times do |j|
@next_hop[i][j] = j if @network[i][j] != Float::INFINITY && i != j
end
end
# Floyd-Warshall
@vertex_count.times do |k|
@vertex_count.times do |i|
@vertex_count.times do |j|
if @distances[i][k] + @distances[k][j] < @distances[i][j]
@distances[i][j] = @distances[i][k] + @distances[k][j]
@next_hop[i][j] = @next_hop[i][k]
end
end
end
end
end
def route_packet(source, destination)
return nil if @next_hop[source][destination].nil?
{
total_cost: @distances[source][destination],
next_hop: @next_hop[source][destination],
full_path: reconstruct_path(source, destination)
}
end
private
def reconstruct_path(start, finish)
return nil if @next_hop[start][finish].nil?
path = [start]
current = start
while current != finish
current = @next_hop[current][finish]
path << current
end
path
end
end
# 5-router network
INF = Float::INFINITY
network = [
[0, 4, INF, INF, 8 ],
[4, 0, 8, INF, INF],
[INF, 8, 0, 2, INF],
[INF, INF, 2, 0, 9 ],
[8, INF, INF, 9, 0 ]
]
router = NetworkRouter.new(network)
route = router.route_packet(0, 3)
# => {
# total_cost: 14,
# next_hop: 1,
# full_path: [0, 1, 2, 3]
# }
Analyzing social network closeness centrality requires computing distances between all user pairs. Floyd-Warshall provides the complete distance matrix needed to calculate how close each user is to all others in the network.
class SocialNetworkAnalyzer
def initialize(adjacency_matrix)
@graph = adjacency_matrix
@vertex_count = adjacency_matrix.length
@distances = compute_all_pairs_distances
end
def closeness_centrality(vertex)
total_distance = 0
reachable_count = 0
@vertex_count.times do |i|
next if i == vertex
if @distances[vertex][i] != Float::INFINITY
total_distance += @distances[vertex][i]
reachable_count += 1
end
end
return 0 if reachable_count == 0
reachable_count.to_f / total_distance
end
def average_path_length
total = 0
count = 0
@vertex_count.times do |i|
@vertex_count.times do |j|
next if i == j
if @distances[i][j] != Float::INFINITY
total += @distances[i][j]
count += 1
end
end
end
count > 0 ? total.to_f / count : Float::INFINITY
end
def network_diameter
max_distance = 0
@vertex_count.times do |i|
@vertex_count.times do |j|
next if i == j
if @distances[i][j] != Float::INFINITY && @distances[i][j] > max_distance
max_distance = @distances[i][j]
end
end
end
max_distance
end
private
def compute_all_pairs_distances
dist = @graph.map(&:dup)
v = @vertex_count
v.times do |k|
v.times do |i|
v.times do |j|
dist[i][j] = [dist[i][j], dist[i][k] + dist[k][j]].min
end
end
end
dist
end
end
# Friendship network (unweighted, so all edges = 1)
INF = Float::INFINITY
friends = [
[0, 1, 1, INF, INF],
[1, 0, 1, 1, INF],
[1, 1, 0, INF, 1 ],
[INF, 1, INF, 0, 1 ],
[INF, INF, 1, 1, 0 ]
]
analyzer = SocialNetworkAnalyzer.new(friends)
analyzer.closeness_centrality(1) # => 1.0 (most central user)
analyzer.network_diameter # => 3
analyzer.average_path_length # => 1.6
Transitive closure computation determines reachability between all vertex pairs. Floyd-Warshall adaptation replaces distance calculations with boolean reachability, producing a matrix indicating which vertices can reach which others.
def transitive_closure(graph)
v = graph.length
reach = Array.new(v) { Array.new(v, false) }
# Initialize: direct edges and self-loops
v.times do |i|
v.times do |j|
reach[i][j] = true if i == j || graph[i][j] == 1
end
end
# Floyd-Warshall for reachability
v.times do |k|
v.times do |i|
v.times do |j|
reach[i][j] = reach[i][j] || (reach[i][k] && reach[k][j])
end
end
end
reach
end
# Directed graph (1 = edge exists, 0 = no edge)
graph = [
[0, 1, 0, 0],
[0, 0, 1, 0],
[0, 0, 0, 1],
[1, 0, 0, 0]
]
closure = transitive_closure(graph)
# closure[0][3] => true (can reach 3 from 0 via 0→1→2→3)
# All entries true (every vertex reaches every other - it's a cycle)
Design Considerations
Floyd-Warshall suits scenarios requiring complete all-pairs distance information, particularly when the graph is dense or when multiple shortest path queries will be made. The algorithm performs V³ operations regardless of edge count, making it efficient for dense graphs where E approaches V² but wasteful for sparse graphs.
For sparse graphs with E << V², running Dijkstra's algorithm V times produces better performance. Dijkstra with a binary heap runs in O(V² log V) per execution, giving O(V³ log V) total for all pairs. With a Fibonacci heap, single-source Dijkstra achieves O(E + V log V), yielding O(VE + V² log V) for all sources. When E = O(V), this becomes O(V²), significantly better than Floyd-Warshall's O(V³).
# Decision matrix
def choose_algorithm(vertices, edges)
density = edges.to_f / (vertices * (vertices - 1))
if density > 0.5
# Dense graph
:floyd_warshall
elsif negative_weights_present?
# Sparse with negative weights
:bellman_ford_all_pairs # V * O(VE) = O(V²E)
else
# Sparse without negative weights
:dijkstra_all_pairs # V * O(E log V) with binary heap
end
end
Negative edge weights eliminate Dijkstra as an option, leaving Floyd-Warshall or V executions of Bellman-Ford. Bellman-Ford runs in O(VE) per source, giving O(V²E) total. For dense graphs where E ≈ V², this equals O(V⁴), worse than Floyd-Warshall's O(V³). Floyd-Warshall becomes the clear choice when negative weights exist in dense graphs.
Memory constraints favor Floyd-Warshall when path reconstruction is not required. The algorithm needs only O(V²) space for the distance matrix, whereas running V instances of Dijkstra or Bellman-Ford might require tracking separate data structures for each source. Path reconstruction adds O(V²) space for the next-hop matrix, but this remains constant regardless of edge count.
Dynamic graphs with frequent edge updates create challenges for Floyd-Warshall. Adding or removing a single edge requires recomputing the entire distance matrix in O(V³) time. Incremental shortest path algorithms handle updates more efficiently, though they increase complexity. For mostly static graphs with occasional batch updates, precomputing with Floyd-Warshall and caching results often proves more practical than maintaining incremental data structures.
class CachedFloydWarshall
def initialize(graph)
@graph = graph
@cache_valid = false
@distances = nil
end
def shortest_distance(i, j)
recompute unless @cache_valid
@distances[i][j]
end
def update_edge(from, to, new_weight)
@graph[from][to] = new_weight
@cache_valid = false
end
def batch_update
yield self
recompute
end
private
def recompute
@distances = floyd_warshall(@graph)
@cache_valid = true
end
end
Performance Considerations
The O(V³) time complexity stems from three nested loops, each iterating V times. This cubic growth limits Floyd-Warshall to graphs with hundreds or low thousands of vertices on modern hardware. A graph with 1,000 vertices requires one billion operations, taking seconds on typical processors. A graph with 10,000 vertices requires one trillion operations, pushing into minutes or hours.
Space complexity of O(V²) for the distance matrix becomes prohibitive as vertex count grows. A 10,000-vertex graph needs 100 million matrix entries. Using 8-byte floating-point numbers, this consumes 800MB of memory. Add path reconstruction with another 100 million integers at 8 bytes each, and memory usage reaches 1.6GB for a single graph.
Cache efficiency impacts actual performance significantly. The algorithm's memory access pattern benefits from good spatial locality when accessing the distance matrix. Processing by rows (the i-loop) maintains sequential memory access, improving cache hit rates. Transposing the loops to process by columns can degrade performance due to cache misses on large matrices.
require 'benchmark'
def benchmark_floyd_warshall(vertex_count)
graph = Array.new(vertex_count) { Array.new(vertex_count, Float::INFINITY) }
# Initialize with random edges
vertex_count.times do |i|
graph[i][i] = 0
(vertex_count / 2).times do
j = rand(vertex_count)
graph[i][j] = rand(1..10) if i != j
end
end
Benchmark.measure do
shortest_paths_space_optimized(graph)
end
end
# Timing results (example output)
# 100 vertices: ~0.05 seconds
# 200 vertices: ~0.4 seconds
# 500 vertices: ~6 seconds
# 1000 vertices: ~50 seconds
Loop unrolling provides minor optimizations by reducing loop overhead and enabling better instruction pipelining. However, the massive number of operations overwhelms these micro-optimizations. The fundamental O(V³) complexity dominates performance characteristics.
Early termination optimization applies when detecting negative cycles is the primary goal. After finding a negative diagonal entry, the algorithm can immediately abort since the graph contains a negative cycle. This saves computation when negative cycle detection fails early but provides no benefit when checking completes successfully.
def detect_negative_cycle_fast(graph)
dist = graph.map(&:dup)
v = graph.length
v.times do |k|
v.times do |i|
# Check diagonal after each k iteration
return true if dist[i][i] < 0
v.times do |j|
dist[i][j] = [dist[i][j], dist[i][k] + dist[k][j]].min
end
end
end
v.times { |i| return true if dist[i][i] < 0 }
false
end
Blocked matrix algorithms improve cache utilization by dividing the distance matrix into square blocks that fit in cache. Processing blocks sequentially reduces cache misses, improving performance on large matrices. Implementation complexity increases substantially, and benefits materialize only for matrices large enough that cache misses dominate computation time.
GPU acceleration achieves significant speedups for Floyd-Warshall since the algorithm has high arithmetic intensity and parallel structure. The inner two loops can execute simultaneously across thousands of GPU cores. However, the k-loop must execute sequentially, limiting parallelism to V² parallel operations per k iteration. GPU implementations using CUDA or OpenCL achieve 10-100x speedups on large graphs compared to CPU implementations.
Common Pitfalls
Incorrect infinity initialization causes the algorithm to fail. Setting non-existent edges to large finite values like 999999 instead of Float::INFINITY leads to incorrect shortest paths when the algorithm finds "paths" through non-existent edges. The min operation compares these large values, treating them as real paths.
# WRONG - using large number instead of infinity
graph = Array.new(5) { Array.new(5, 999999) }
5.times { |i| graph[i][i] = 0 }
# After Floyd-Warshall, spurious paths appear
# because 999999 + 999999 < 999999 evaluates to false
# but the algorithm treats 999999 as a real distance
# CORRECT - using Float::INFINITY
graph = Array.new(5) { Array.new(5, Float::INFINITY) }
5.times { |i| graph[i][i] = 0 }
Forgetting to initialize diagonal elements to zero creates nonsensical results where the shortest path from a vertex to itself appears to be non-zero. The algorithm then propagates these incorrect self-loop distances throughout the matrix, corrupting all path calculations.
Integer overflow occurs when edge weights or accumulated distances exceed integer maximum values. Using 32-bit integers with large weights causes overflow during addition, wrapping to negative values and producing incorrect results. Float types avoid overflow but introduce precision errors for very large or very small values.
# Integer overflow example
MAX_INT = 2**31 - 1
large_weight = MAX_INT / 2
# This addition overflows in 32-bit integers
# MAX_INT / 2 + MAX_INT / 2 wraps to negative
# Solution: use Float or check for overflow
def safe_add(a, b)
return Float::INFINITY if a == Float::INFINITY || b == Float::INFINITY
sum = a + b
sum
end
Modifying the graph during execution corrupts the algorithm. The distance matrix represents work in progress during iterations. Changing edge weights or the graph structure while the algorithm runs leads to undefined behavior since the algorithm assumes static input throughout execution.
Negative cycle misinterpretation happens when developers check for negative cycles after initialization but before running the algorithm. Negative weights on individual edges do not constitute negative cycles. Only after Floyd-Warshall completes can negative diagonal entries indicate negative cycles reachable from those vertices.
Path reconstruction errors occur when updating the distance matrix without correspondingly updating the predecessor matrix. Each distance update via intermediate vertex k must also update the next-hop to point to the same intermediate vertex's next-hop toward the destination.
# WRONG - updating distance without next-hop
if dist[i][k] + dist[k][j] < dist[i][j]
dist[i][j] = dist[i][k] + dist[k][j]
# Forgot to update next_hop[i][j]
end
# CORRECT - maintaining consistency
if dist[i][k] + dist[k][j] < dist[i][j]
dist[i][j] = dist[i][k] + dist[k][j]
next_hop[i][j] = next_hop[i][k]
end
Directed versus undirected graph confusion leads to errors when representing undirected graphs. An undirected edge between i and j requires two matrix entries: graph[i][j] and graph[j][i], both with the same weight. Forgetting to set both entries creates an asymmetric matrix representing a directed graph instead.
Memory aliasing bugs appear when shallow-copying the input graph. Modifying the distance matrix during the algorithm should not affect the original graph, but shallow copying causes both to reference the same inner arrays. Deep copying with map(&:dup) creates independent arrays.
# WRONG - shallow copy
dist = graph.map { |row| row }
# Modifying dist[i][j] also modifies graph[i][j]
# CORRECT - deep copy
dist = graph.map(&:dup)
# dist and graph are now independent
Reference
Algorithm Complexity
| Metric | Complexity | Notes |
|---|---|---|
| Time | O(V³) | Three nested loops over all vertices |
| Space | O(V²) | Distance matrix storage |
| Space with paths | O(V²) | Distance matrix plus next-hop matrix |
| Best case | O(V³) | No early termination possible |
| Worst case | O(V³) | Same as best case |
Key Operations
| Operation | Time | Description |
|---|---|---|
| Initialize matrix | O(V²) | Copy graph to distance matrix |
| Main algorithm | O(V³) | Triple nested loop |
| Negative cycle check | O(V) | Scan diagonal after completion |
| Path reconstruction | O(V) | Follow next-hop pointers |
| Single distance query | O(1) | Direct matrix lookup after computation |
Distance Matrix Initialization
| Matrix Entry | Initial Value | Meaning |
|---|---|---|
| D[i][i] | 0 | Distance from vertex to itself |
| D[i][j] edge exists | weight(i,j) | Direct edge weight |
| D[i][j] no edge | Float::INFINITY | No direct connection |
Algorithm Comparison
| Algorithm | Time per source | All pairs time | Handles negative weights | Space |
|---|---|---|---|---|
| Floyd-Warshall | N/A | O(V³) | Yes | O(V²) |
| Dijkstra all pairs | O(E + V log V) | O(VE + V² log V) | No | O(V) per source |
| Bellman-Ford all pairs | O(VE) | O(V²E) | Yes | O(V) per source |
| Johnson's algorithm | O(E log V) | O(VE + V² log V) | Yes | O(V²) |
Use Case Selection
| Scenario | Recommended Algorithm | Reason |
|---|---|---|
| Dense graph, all pairs needed | Floyd-Warshall | O(V³) optimal for dense graphs |
| Sparse graph, no negative weights | Dijkstra from each vertex | O(VE log V) better for sparse |
| Sparse graph, negative weights | Bellman-Ford from each vertex | Handles negatives, O(V²E) acceptable |
| Negative weights, very sparse | Johnson's algorithm | Best for sparse with negatives |
| Single source needed | Dijkstra or Bellman-Ford | No need for all pairs |
| Only reachability needed | Transitive closure variant | Boolean operations faster |
Common Graph Representations
| Representation | Initialization | Space | Best for |
|---|---|---|---|
| Adjacency matrix | Direct use | O(V²) | Dense graphs |
| Adjacency list | Convert to matrix | O(E) input, O(V²) working | Sparse graphs |
| Edge list | Convert to matrix | O(E) input, O(V²) working | Very sparse graphs |
Negative Cycle Detection
| Check | When | Result Interpretation |
|---|---|---|
| After initialization | Before algorithm | Meaningless - negative edges not cycles |
| After each k iteration | During execution | Partial - may not detect all cycles |
| After completion | After all iterations | D[i][i] < 0 indicates negative cycle through i |
Path Reconstruction Steps
| Step | Operation | Data Structure |
|---|---|---|
| 1. Initialize | Set next[i][j] = j for all edges | Next-hop matrix |
| 2. Update | next[i][j] = next[i][k] when updating distance | Maintain consistency |
| 3. Reconstruct | Follow next[start][dest] until reaching dest | Build path list |
| 4. Check validity | Verify next[i][j] not null | Handle unreachable pairs |