Overview
The Longest Common Subsequence (LCS) problem finds the longest sequence of elements that appear in the same relative order within two input sequences, though not necessarily consecutively. Unlike substring matching where elements must be adjacent, subsequence matching allows gaps between matched elements.
Given sequences "ABCDGH" and "AEDFHR", the LCS is "ADH" with length 3. The characters A, D, and H appear in both sequences in the same order, though other characters appear between them. This differs from the longest common substring problem, which would find "A" as the only common consecutive sequence.
The LCS algorithm applies to any comparable sequence elements: strings, arrays of numbers, DNA bases, file lines in diff utilities, or object sequences. The problem has optimal substructure and overlapping subproblems, making it a canonical dynamic programming application.
The algorithm solves practical problems in bioinformatics (DNA sequence alignment), version control systems (computing file differences), data synchronization, and plagiarism detection. Computing LCS between two sequences of length m and n typically requires O(mn) time and space, though optimizations can reduce space complexity to O(min(m,n)).
Key Principles
The LCS problem exhibits two fundamental properties that enable dynamic programming solutions: optimal substructure and overlapping subproblems. These properties allow breaking the problem into smaller subproblems whose solutions combine to solve the original problem.
Optimal substructure means the LCS of two sequences builds from LCS solutions of their prefixes. If sequences X and Y have lengths m and n, and X[m] equals Y[n], then LCS(X,Y) equals LCS(X[1..m-1], Y[1..n-1]) plus the matching character. If X[m] differs from Y[n], then LCS(X,Y) equals the maximum of LCS(X[1..m-1], Y) and LCS(X, Y[1..n-1]).
Consider sequences "AGGTAB" and "GXTXAYB". When comparing the final characters 'B' and 'B', they match, so the LCS includes this character plus the LCS of "AGGTA" and "GXTXAY". When characters don't match, the algorithm explores both possibilities: excluding the last character from the first sequence or excluding it from the second sequence, taking whichever yields a longer result.
Overlapping subproblems occur because the same prefix comparisons repeat multiple times during recursive computation. Computing LCS("ABC", "AC") requires solving LCS("AB", "AC"), LCS("ABC", "A"), LCS("AB", "A"), and LCS("A", "A"). The subproblem LCS("AB", "A") appears in multiple recursive paths. Without memoization, the naive recursive solution recomputes identical subproblems exponentially many times.
The recurrence relation formalizes these principles:
LCS(X[1..m], Y[1..n]) =
0 if m = 0 or n = 0
LCS(X[1..m-1], Y[1..n-1]) + 1 if X[m] = Y[n]
max(LCS(X[1..m-1], Y[1..n]), LCS(X[1..m], Y[1..n-1])) if X[m] ≠ Y[n]
Base cases handle empty sequences, which have LCS length zero. The recursive cases either include a matching character or skip a character from one sequence. The algorithm considers all valid character pairings through this recursive exploration.
Dynamic programming stores subproblem results in a table to avoid recomputation. A table L[i][j] stores the LCS length of X[1..i] and Y[1..j]. Building this table bottom-up from smaller to larger prefixes ensures each subproblem computes exactly once. The final answer appears in L[m][n].
The algorithm can compute just the LCS length or reconstruct the actual subsequence. Length computation requires only the table values. Reconstruction requires backtracking through the table, following the decisions that produced each value: when characters match, include that character; when they differ, follow the direction that gave the maximum value.
Implementation Approaches
Three primary approaches implement LCS computation: naive recursion, memoized recursion, and tabulation. Each offers different trade-offs between implementation complexity, performance, and space usage.
Naive recursion directly implements the recurrence relation without storing intermediate results. This approach has simple logic but exponential time complexity O(2^(m+n)) due to redundant subproblem computation. Each call may trigger two recursive calls, creating an exponential tree of computation. This approach remains impractical for sequences longer than 20-30 elements.
function lcs_recursive(X, Y, m, n):
if m == 0 or n == 0:
return 0
if X[m-1] == Y[n-1]:
return 1 + lcs_recursive(X, Y, m-1, n-1)
else:
return max(lcs_recursive(X, Y, m-1, n), lcs_recursive(X, Y, m, n-1))
Memoized recursion adds a cache to store computed subproblems, converting exponential time to O(mn) while maintaining recursive structure. A hash table or 2D array stores results indexed by (i,j) pairs representing prefix lengths. Before computing each subproblem, check if the result exists in the cache. This top-down approach computes only the subproblems needed for the final answer, potentially skipping some table entries.
function lcs_memoized(X, Y, m, n, memo):
if m == 0 or n == 0:
return 0
if memo[m][n] is not computed:
if X[m-1] == Y[n-1]:
memo[m][n] = 1 + lcs_memoized(X, Y, m-1, n-1, memo)
else:
memo[m][n] = max(lcs_memoized(X, Y, m-1, n, memo),
lcs_memoized(X, Y, m, n-1, memo))
return memo[m][n]
Tabulation builds the solution bottom-up, filling a table iteratively from base cases to the final answer. This approach eliminates recursion overhead and allows space optimizations. The table fills row by row or column by column, ensuring each entry depends only on previously computed values. Tabulation guarantees O(mn) time with predictable memory access patterns, often running faster than memoization despite identical asymptotic complexity.
Space-optimized tabulation reduces memory from O(mn) to O(min(m,n)) by observing that computing row i only requires values from row i-1. Instead of maintaining the entire table, keep only two rows: the current and previous. Some implementations use a single array and update values carefully to avoid overwriting needed data. This optimization maintains O(mn) time while reducing space to O(n) for the shorter sequence.
The choice between approaches depends on requirements. Memoization works better when many subproblems may remain unnecessary, such as when early termination conditions exist. Tabulation offers better performance for cases requiring most table entries, easier space optimization, and avoids stack overflow on large inputs. Space-optimized tabulation suits memory-constrained environments but complicates subsequence reconstruction, which requires the full table for backtracking.
Ruby Implementation
Ruby implementations of LCS leverage the language's expressive syntax for clear, maintainable code. The standard approach uses 2D arrays for tabulation with straightforward indexing.
def lcs_length(x, y)
m, n = x.length, y.length
dp = Array.new(m + 1) { Array.new(n + 1, 0) }
(1..m).each do |i|
(1..n).each do |j|
if x[i - 1] == y[j - 1]
dp[i][j] = dp[i - 1][j - 1] + 1
else
dp[i][j] = [dp[i - 1][j], dp[i][j - 1]].max
end
end
end
dp[m][n]
end
lcs_length("AGGTAB", "GXTXAYB")
# => 4
The implementation initializes a table with m+1 rows and n+1 columns, where row 0 and column 0 represent empty sequence comparisons with value 0. The nested iteration fills the table row by row, comparing characters using zero-based indexing (x[i-1] and y[j-1]) while table indices start from 1. This offset simplifies base case handling.
Reconstructing the actual subsequence requires backtracking through the table, following the decisions that built each value:
def lcs_string(x, y)
m, n = x.length, y.length
dp = Array.new(m + 1) { Array.new(n + 1, 0) }
(1..m).each do |i|
(1..n).each do |j|
if x[i - 1] == y[j - 1]
dp[i][j] = dp[i - 1][j - 1] + 1
else
dp[i][j] = [dp[i - 1][j], dp[i][j - 1]].max
end
end
end
# Backtrack to build the LCS string
result = []
i, j = m, n
while i > 0 && j > 0
if x[i - 1] == y[j - 1]
result.unshift(x[i - 1])
i -= 1
j -= 1
elsif dp[i - 1][j] > dp[i][j - 1]
i -= 1
else
j -= 1
end
end
result.join
end
lcs_string("AGGTAB", "GXTXAYB")
# => "GTAB"
Backtracking starts at dp[m][n] and moves toward dp[0][0]. When characters match, include that character in the result and move diagonally. When they differ, move in the direction of the larger value. The result builds in reverse order, requiring unshift to prepend characters or collecting in reverse for later reversal.
Ruby's functional programming capabilities enable memoized recursion with clean syntax:
class LCSMemoized
def initialize(x, y)
@x = x
@y = y
@memo = {}
end
def compute(i = @x.length, j = @y.length)
return 0 if i == 0 || j == 0
key = [i, j]
return @memo[key] if @memo.key?(key)
@memo[key] = if @x[i - 1] == @y[j - 1]
1 + compute(i - 1, j - 1)
else
[compute(i - 1, j), compute(i, j - 1)].max
end
end
end
solver = LCSMemoized.new("AGGTAB", "GXTXAYB")
solver.compute
# => 4
The class encapsulates the sequences and memo hash, making the recursive method cleaner. The memoization key uses an array [i, j], which Ruby hashes correctly. The ternary-style if expression assigns the computed value directly to the memo entry.
Space-optimized tabulation reduces memory usage for large sequences:
def lcs_length_optimized(x, y)
# Ensure x is the shorter sequence
x, y = y, x if x.length > y.length
m, n = x.length, y.length
prev_row = Array.new(m + 1, 0)
curr_row = Array.new(m + 1, 0)
(1..n).each do |j|
(1..m).each do |i|
if x[i - 1] == y[j - 1]
curr_row[i] = prev_row[i - 1] + 1
else
curr_row[i] = [prev_row[i], curr_row[i - 1]].max
end
end
prev_row, curr_row = curr_row, prev_row
end
prev_row[m]
end
lcs_length_optimized("AGGTAB", "GXTXAYB")
# => 4
This implementation maintains only two rows instead of the full m×n table. After processing each row, it swaps the current and previous row references. The swap uses Ruby's parallel assignment for clean syntax. This reduces space from O(mn) to O(m), where m represents the shorter sequence length.
For sequences stored as arrays rather than strings, the comparison and character handling adapts naturally:
def lcs_array(arr1, arr2)
m, n = arr1.length, arr2.length
dp = Array.new(m + 1) { Array.new(n + 1, 0) }
(1..m).each do |i|
(1..n).each do |j|
if arr1[i - 1] == arr2[j - 1]
dp[i][j] = dp[i - 1][j - 1] + 1
else
dp[i][j] = [dp[i - 1][j], dp[i][j - 1]].max
end
end
end
# Reconstruct array LCS
result = []
i, j = m, n
while i > 0 && j > 0
if arr1[i - 1] == arr2[j - 1]
result.unshift(arr1[i - 1])
i -= 1
j -= 1
elsif dp[i - 1][j] > dp[i][j - 1]
i -= 1
else
j -= 1
end
end
result
end
lcs_array([1, 2, 3, 4, 5], [2, 3, 5, 7])
# => [2, 3, 5]
The array version works identically to the string version, returning an array of matching elements. Ruby's equality operator handles element comparison, supporting integers, strings, symbols, or custom objects that implement ==.
Performance Considerations
LCS computation has inherent time complexity O(mn) for sequences of length m and n. No known algorithm solves the general LCS problem faster than quadratic time. Space complexity varies from O(mn) for basic tabulation to O(min(m,n)) for optimized approaches.
The naive recursive approach exhibits exponential time O(2^(m+n)) due to redundant computation. Each call may branch into two recursive calls, creating a binary tree of depth m+n. The number of subproblems grows exponentially, making this approach practical only for sequences under 20-30 elements. A sequence pair of length 40 each could trigger trillions of recursive calls.
Dynamic programming reduces time to O(mn) by computing each subproblem once. For sequences of length 1000 each, this means 1 million table entries versus potentially 2^2000 recursive calls. The space requirement of O(mn) can become problematic for very long sequences. Two sequences of 100,000 elements each require a table with 10 billion entries, potentially exceeding available memory.
Space optimization reduces memory to O(min(m,n)) by maintaining only two rows (or columns) of the DP table. This trades the ability to reconstruct the actual subsequence for reduced memory usage. For length computation only, this optimization enables processing much longer sequences. Computing LCS length for two 1 million element sequences requires only 1 million memory cells instead of 1 trillion.
def benchmark_lcs_approaches(x, y)
require 'benchmark'
Benchmark.bm(20) do |bm|
bm.report("Tabulation:") do
lcs_length(x, y)
end
bm.report("Memoization:") do
solver = LCSMemoized.new(x, y)
solver.compute
end
bm.report("Space-optimized:") do
lcs_length_optimized(x, y)
end
end
end
# Example output for moderately sized inputs:
# Tabulation: 0.180000 0.000000 0.180000
# Memoization: 0.240000 0.010000 0.250000
# Space-optimized: 0.190000 0.000000 0.190000
Memoization typically runs slower than tabulation despite identical asymptotic complexity. The overhead of hash table operations, recursive function calls, and cache lookups adds constant factors. Tabulation uses simple array indexing with predictable memory access patterns, benefiting from CPU cache locality. The performance gap widens as sequence length increases.
Character comparison operations dominate execution time. For sequences with many matching characters, the algorithm performs more diagonal moves in the table. For completely dissimilar sequences, it performs more row/column moves. The distribution of matches and mismatches affects constant factors but not asymptotic complexity.
Memory access patterns impact performance significantly. Tabulation filling row by row exhibits good cache locality, accessing adjacent memory locations. Column-by-column filling can cause more cache misses depending on array layout in memory. Ruby's array implementation stores elements contiguously, favoring row-major access.
For very long sequences with high similarity, algorithms like Hunt-Szymanski can outperform dynamic programming, achieving O((r+n) log n) time where r represents the number of matched pairs. This approach suits scenarios like version control diffs where files have many identical lines. The algorithm becomes slower than DP when sequences have low similarity, making DP the safer general-purpose choice.
Parallelization offers limited benefits for LCS computation. The dependencies between table cells (each cell depends on three others) create serialization constraints. Diagonal computation allows some parallelization, processing cells on the same diagonal concurrently since they don't depend on each other. This approach complicates implementation significantly while providing modest speedups.
Practical Examples
Version control systems compute file differences by finding the LCS of file lines. Two versions of a file represent sequences of text lines, and the LCS identifies unchanged lines. Lines not in the LCS represent additions or deletions.
def compute_diff(original_lines, modified_lines)
lcs = lcs_array(original_lines, modified_lines)
lcs_set = Set.new(lcs.map.with_index { |line, idx| [line, idx] })
diff = []
lcs_idx = 0
orig_idx = 0
mod_idx = 0
while orig_idx < original_lines.length || mod_idx < modified_lines.length
if orig_idx < original_lines.length &&
lcs_idx < lcs.length &&
original_lines[orig_idx] == lcs[lcs_idx]
# Line unchanged
diff << [:unchanged, original_lines[orig_idx]]
orig_idx += 1
mod_idx += 1
lcs_idx += 1
elsif orig_idx < original_lines.length &&
(mod_idx >= modified_lines.length ||
original_lines[orig_idx] != lcs[lcs_idx])
# Line deleted
diff << [:deleted, original_lines[orig_idx]]
orig_idx += 1
else
# Line added
diff << [:added, modified_lines[mod_idx]]
mod_idx += 1
end
end
diff
end
original = ["line 1", "line 2", "line 3", "line 4"]
modified = ["line 1", "line 2a", "line 3", "line 5"]
compute_diff(original, modified)
# => [[:unchanged, "line 1"], [:deleted, "line 2"], [:added, "line 2a"],
# [:unchanged, "line 3"], [:deleted, "line 4"], [:added, "line 5"]]
This implementation produces a basic diff output marking each line as unchanged, added, or deleted. Production diff tools add features like context lines, hunks, and more sophisticated change detection, but the core algorithm relies on LCS computation.
DNA sequence alignment uses LCS to find common subsequences between genetic sequences. Bioinformatics applications identify conserved regions across species or detect mutations. Each sequence represents a string of bases (A, C, G, T).
def dna_alignment(seq1, seq2)
m, n = seq1.length, seq2.length
dp = Array.new(m + 1) { Array.new(n + 1, 0) }
(1..m).each do |i|
(1..n).each do |j|
if seq1[i - 1] == seq2[j - 1]
dp[i][j] = dp[i - 1][j - 1] + 1
else
dp[i][j] = [dp[i - 1][j], dp[i][j - 1]].max
end
end
end
# Reconstruct alignment
alignment1 = []
alignment2 = []
i, j = m, n
while i > 0 || j > 0
if i > 0 && j > 0 && seq1[i - 1] == seq2[j - 1]
alignment1.unshift(seq1[i - 1])
alignment2.unshift(seq2[j - 1])
i -= 1
j -= 1
elsif j > 0 && (i == 0 || dp[i][j - 1] >= dp[i - 1][j])
alignment1.unshift('-')
alignment2.unshift(seq2[j - 1])
j -= 1
else
alignment1.unshift(seq1[i - 1])
alignment2.unshift('-')
i -= 1
end
end
{
sequence1: alignment1.join,
sequence2: alignment2.join,
matches: alignment1.zip(alignment2).count { |a, b| a == b && a != '-' },
similarity: alignment1.zip(alignment2).count { |a, b| a == b && a != '-' }.to_f / [m, n].max
}
end
result = dna_alignment("AGGTCGA", "AGTTCG")
# => {:sequence1=>"AGGTCGA", :sequence2=>"AG-TTCG-", :matches=>5, :similarity=>0.71...}
This alignment visualization shows where sequences match and where gaps appear. The similarity metric indicates how closely related the sequences are. More sophisticated alignment algorithms like Smith-Waterman or Needleman-Wunsch extend LCS with gap penalties and scoring matrices.
Data synchronization between devices or systems uses LCS to identify minimal changes needed to transform one dataset into another. The algorithm determines which records to add, delete, or leave unchanged.
class DataSynchronizer
def initialize(local_records, remote_records)
@local = local_records
@remote = remote_records
end
def compute_sync_operations
# Find LCS of record IDs
local_ids = @local.map { |r| r[:id] }
remote_ids = @remote.map { |r| r[:id] }
lcs = lcs_array(local_ids, remote_ids)
lcs_set = Set.new(lcs)
operations = []
# Records to delete (in local but not in LCS)
local_ids.each do |id|
unless lcs_set.include?(id)
operations << { action: :delete, id: id }
end
end
# Records to add (in remote but not in LCS)
remote_ids.each do |id|
unless lcs_set.include?(id)
record = @remote.find { |r| r[:id] == id }
operations << { action: :add, record: record }
end
end
operations
end
end
local = [
{ id: 1, name: "Alice" },
{ id: 2, name: "Bob" },
{ id: 3, name: "Charlie" }
]
remote = [
{ id: 1, name: "Alice" },
{ id: 3, name: "Charlie" },
{ id: 4, name: "Diana" }
]
sync = DataSynchronizer.new(local, remote)
sync.compute_sync_operations
# => [{:action=>:delete, :id=>2}, {:action=>:add, :record=>{:id=>4, :name=>"Diana"}}]
This synchronization strategy minimizes changes by identifying records that already match. Only records not in the LCS require modification. This approach reduces data transfer and processing overhead in distributed systems.
Plagiarism detection compares document sentences or paragraphs to find copied content. The LCS length indicates how much content appears in both documents in the same order.
def detect_similarity(doc1_sentences, doc2_sentences)
lcs_length = lcs_array(doc1_sentences, doc2_sentences).length
max_length = [doc1_sentences.length, doc2_sentences.length].max
{
common_sentences: lcs_length,
similarity_percentage: (lcs_length.to_f / max_length * 100).round(2),
potential_plagiarism: lcs_length.to_f / max_length > 0.3
}
end
doc1 = ["Introduction to algorithms.", "Dynamic programming solves optimization problems.",
"LCS is a classic example.", "Implementation requires careful analysis."]
doc2 = ["Dynamic programming solves optimization problems.", "LCS is a classic example.",
"Many applications exist.", "Testing is important."]
detect_similarity(doc1, doc2)
# => {:common_sentences=>2, :similarity_percentage=>50.0, :potential_plagiarism=>true}
This basic plagiarism detector flags documents with high sequential overlap. Production systems add features like fuzzy matching, phrase-level comparison, and citation detection, but LCS provides the foundation for identifying ordered commonality.
Common Patterns
The standard tabulation pattern builds the DP table bottom-up, handling all edge cases through careful iteration bounds. This pattern applies to most LCS variations.
def standard_lcs_pattern(x, y)
m, n = x.length, y.length
dp = Array.new(m + 1) { Array.new(n + 1, 0) }
(1..m).each do |i|
(1..n).each do |j|
dp[i][j] = if x[i - 1] == y[j - 1]
dp[i - 1][j - 1] + 1
else
[dp[i - 1][j], dp[i][j - 1]].max
end
end
end
dp[m][n]
end
The path reconstruction pattern backtracks through the DP table to build the actual subsequence. This pattern requires the complete table and cannot work with space-optimized implementations.
def reconstruction_pattern(x, y, dp)
result = []
i, j = x.length, y.length
while i > 0 && j > 0
if x[i - 1] == y[j - 1]
result.unshift(x[i - 1])
i -= 1
j -= 1
elsif dp[i - 1][j] >= dp[i][j - 1]
i -= 1
else
j -= 1
end
end
result
end
The backtracking loop terminates when reaching row 0 or column 0. Movement direction depends on which neighbor cell has the larger value, with diagonal movement when characters match. Building the result with unshift maintains correct order without requiring reversal.
The space optimization pattern maintains only the current and previous rows, dramatically reducing memory for length-only computation.
def space_optimized_pattern(x, y)
m, n = x.length, y.length
prev = Array.new(n + 1, 0)
curr = Array.new(n + 1, 0)
m.times do |i|
n.times do |j|
curr[j + 1] = if x[i] == y[j]
prev[j] + 1
else
[prev[j + 1], curr[j]].max
end
end
prev, curr = curr, prev
end
prev[n]
end
After each row computation, swapping prev and curr reuses arrays instead of allocating new ones. This pattern reduces garbage collection pressure and memory overhead. The indexing shifts slightly to avoid index-by-one errors.
The memoization pattern caches recursive calls using a hash or 2D array. This pattern suits scenarios where not all subproblems require computation or when implementing within an existing recursive structure.
def memoization_pattern(x, y, i = nil, j = nil, memo = {})
i ||= x.length - 1
j ||= y.length - 1
return 0 if i < 0 || j < 0
return memo[[i, j]] if memo.key?([i, j])
memo[[i, j]] = if x[i] == y[j]
1 + memoization_pattern(x, y, i - 1, j - 1, memo)
else
[
memoization_pattern(x, y, i - 1, j, memo),
memoization_pattern(x, y, i, j - 1, memo)
].max
end
end
This pattern initializes indices to sequence endpoints and decrements during recursion. The memo hash uses arrays as keys, relying on Ruby's array equality. Default parameters simplify the initial call while allowing the memo to persist across recursive calls.
The early termination pattern detects when no longer subsequence exists without completing full computation. This pattern applies when checking if LCS length exceeds a threshold.
def lcs_exceeds_threshold?(x, y, threshold)
m, n = x.length, y.length
dp = Array.new(m + 1) { Array.new(n + 1, 0) }
(1..m).each do |i|
(1..n).each do |j|
dp[i][j] = if x[i - 1] == y[j - 1]
dp[i - 1][j - 1] + 1
else
[dp[i - 1][j], dp[i][j - 1]].max
end
return true if dp[i][j] > threshold
end
end
false
end
Checking the threshold after each cell update allows returning immediately when the condition satisfies. This optimization saves computation when the threshold is low and sequences have early matches.
Reference
Algorithm Complexity
| Metric | Naive Recursion | Memoization | Tabulation | Space-Optimized |
|---|---|---|---|---|
| Time Complexity | O(2^(m+n)) | O(mn) | O(mn) | O(mn) |
| Space Complexity | O(m+n) | O(mn) | O(mn) | O(min(m,n)) |
| Reconstruction | Yes | Yes | Yes | No |
| Cache Locality | Poor | Medium | Good | Good |
Core Recurrence Relations
| Case | Condition | Formula |
|---|---|---|
| Base Case | i = 0 or j = 0 | LCS = 0 |
| Match | X[i] = Y[j] | LCS[i][j] = LCS[i-1][j-1] + 1 |
| Mismatch | X[i] ≠ Y[j] | LCS[i][j] = max(LCS[i-1][j], LCS[i][j-1]) |
Implementation Decision Matrix
| Requirement | Recommended Approach | Rationale |
|---|---|---|
| Length only | Space-optimized tabulation | Minimal memory, fast |
| Reconstruct sequence | Standard tabulation | Requires full table for backtracking |
| Memory constrained | Space-optimized or memoization | Reduced space usage |
| Large sequences | Space-optimized tabulation | Best memory/performance balance |
| Partial computation | Memoization | Computes only needed subproblems |
| Simple implementation | Standard tabulation | Straightforward, predictable |
Common Variations
| Variation | Description | Complexity Change |
|---|---|---|
| Multiple sequences | LCS of 3+ sequences | O(m × n × p × ...) |
| Weighted LCS | Characters have different match values | Same time, different scoring |
| Constrained LCS | Additional matching constraints | Problem-dependent |
| Approximate LCS | Allow some mismatches | Depends on approximation |
Ruby-Specific Considerations
| Aspect | Recommendation | Example |
|---|---|---|
| Array initialization | Use Array.new with block | Array.new(n) { Array.new(m, 0) } |
| Character comparison | Use bracket notation for strings | x[i] == y[j] |
| Hash memoization | Use arrays as keys | memo[[i, j]] = value |
| Swapping variables | Use parallel assignment | prev, curr = curr, prev |
| Range iteration | Use Range#each for clarity | (1..m).each do end |
Backtracking Movements
| Condition | Movement | Action |
|---|---|---|
| X[i] = Y[j] | Diagonal (i-1, j-1) | Include character |
| DP[i-1][j] > DP[i][j-1] | Up (i-1, j) | Skip X[i] |
| DP[i][j-1] > DP[i-1][j] | Left (i, j-1) | Skip Y[j] |
| DP[i-1][j] = DP[i][j-1] | Either direction | Either sequence works |
Edge Cases
| Case | Handling | Expected Result |
|---|---|---|
| Empty sequences | Return 0 or empty | LCS length 0 |
| Identical sequences | Return the sequence | LCS equals input |
| No common elements | Return 0 or empty | LCS length 0 |
| Single character | Compare directly | 0 or 1 |
| One empty sequence | Return 0 or empty | LCS length 0 |
Performance Optimization Checklist
| Optimization | Impact | Trade-off |
|---|---|---|
| Space-optimize to two rows | O(mn) to O(min(m,n)) space | Cannot reconstruct sequence |
| Use shorter sequence for columns | Reduce space by factor of min/max | Requires length comparison |
| Memoize with hash | Avoid full table allocation | Hash overhead, unpredictable memory |
| Early termination | Reduce computation time | Problem-specific |
| Cache-aware iteration | Better CPU cache usage | Row-major access pattern |