CrackedRuby CrackedRuby

Longest Common Subsequence

Overview

The Longest Common Subsequence (LCS) problem finds the longest sequence of elements that appear in the same relative order within two input sequences, though not necessarily consecutively. Unlike substring matching where elements must be adjacent, subsequence matching allows gaps between matched elements.

Given sequences "ABCDGH" and "AEDFHR", the LCS is "ADH" with length 3. The characters A, D, and H appear in both sequences in the same order, though other characters appear between them. This differs from the longest common substring problem, which would find "A" as the only common consecutive sequence.

The LCS algorithm applies to any comparable sequence elements: strings, arrays of numbers, DNA bases, file lines in diff utilities, or object sequences. The problem has optimal substructure and overlapping subproblems, making it a canonical dynamic programming application.

The algorithm solves practical problems in bioinformatics (DNA sequence alignment), version control systems (computing file differences), data synchronization, and plagiarism detection. Computing LCS between two sequences of length m and n typically requires O(mn) time and space, though optimizations can reduce space complexity to O(min(m,n)).

Key Principles

The LCS problem exhibits two fundamental properties that enable dynamic programming solutions: optimal substructure and overlapping subproblems. These properties allow breaking the problem into smaller subproblems whose solutions combine to solve the original problem.

Optimal substructure means the LCS of two sequences builds from LCS solutions of their prefixes. If sequences X and Y have lengths m and n, and X[m] equals Y[n], then LCS(X,Y) equals LCS(X[1..m-1], Y[1..n-1]) plus the matching character. If X[m] differs from Y[n], then LCS(X,Y) equals the maximum of LCS(X[1..m-1], Y) and LCS(X, Y[1..n-1]).

Consider sequences "AGGTAB" and "GXTXAYB". When comparing the final characters 'B' and 'B', they match, so the LCS includes this character plus the LCS of "AGGTA" and "GXTXAY". When characters don't match, the algorithm explores both possibilities: excluding the last character from the first sequence or excluding it from the second sequence, taking whichever yields a longer result.

Overlapping subproblems occur because the same prefix comparisons repeat multiple times during recursive computation. Computing LCS("ABC", "AC") requires solving LCS("AB", "AC"), LCS("ABC", "A"), LCS("AB", "A"), and LCS("A", "A"). The subproblem LCS("AB", "A") appears in multiple recursive paths. Without memoization, the naive recursive solution recomputes identical subproblems exponentially many times.

The recurrence relation formalizes these principles:

LCS(X[1..m], Y[1..n]) = 
  0                                           if m = 0 or n = 0
  LCS(X[1..m-1], Y[1..n-1]) + 1              if X[m] = Y[n]
  max(LCS(X[1..m-1], Y[1..n]), LCS(X[1..m], Y[1..n-1]))  if X[m] ≠ Y[n]

Base cases handle empty sequences, which have LCS length zero. The recursive cases either include a matching character or skip a character from one sequence. The algorithm considers all valid character pairings through this recursive exploration.

Dynamic programming stores subproblem results in a table to avoid recomputation. A table L[i][j] stores the LCS length of X[1..i] and Y[1..j]. Building this table bottom-up from smaller to larger prefixes ensures each subproblem computes exactly once. The final answer appears in L[m][n].

The algorithm can compute just the LCS length or reconstruct the actual subsequence. Length computation requires only the table values. Reconstruction requires backtracking through the table, following the decisions that produced each value: when characters match, include that character; when they differ, follow the direction that gave the maximum value.

Implementation Approaches

Three primary approaches implement LCS computation: naive recursion, memoized recursion, and tabulation. Each offers different trade-offs between implementation complexity, performance, and space usage.

Naive recursion directly implements the recurrence relation without storing intermediate results. This approach has simple logic but exponential time complexity O(2^(m+n)) due to redundant subproblem computation. Each call may trigger two recursive calls, creating an exponential tree of computation. This approach remains impractical for sequences longer than 20-30 elements.

function lcs_recursive(X, Y, m, n):
  if m == 0 or n == 0:
    return 0
  if X[m-1] == Y[n-1]:
    return 1 + lcs_recursive(X, Y, m-1, n-1)
  else:
    return max(lcs_recursive(X, Y, m-1, n), lcs_recursive(X, Y, m, n-1))

Memoized recursion adds a cache to store computed subproblems, converting exponential time to O(mn) while maintaining recursive structure. A hash table or 2D array stores results indexed by (i,j) pairs representing prefix lengths. Before computing each subproblem, check if the result exists in the cache. This top-down approach computes only the subproblems needed for the final answer, potentially skipping some table entries.

function lcs_memoized(X, Y, m, n, memo):
  if m == 0 or n == 0:
    return 0
  if memo[m][n] is not computed:
    if X[m-1] == Y[n-1]:
      memo[m][n] = 1 + lcs_memoized(X, Y, m-1, n-1, memo)
    else:
      memo[m][n] = max(lcs_memoized(X, Y, m-1, n, memo),
                       lcs_memoized(X, Y, m, n-1, memo))
  return memo[m][n]

Tabulation builds the solution bottom-up, filling a table iteratively from base cases to the final answer. This approach eliminates recursion overhead and allows space optimizations. The table fills row by row or column by column, ensuring each entry depends only on previously computed values. Tabulation guarantees O(mn) time with predictable memory access patterns, often running faster than memoization despite identical asymptotic complexity.

Space-optimized tabulation reduces memory from O(mn) to O(min(m,n)) by observing that computing row i only requires values from row i-1. Instead of maintaining the entire table, keep only two rows: the current and previous. Some implementations use a single array and update values carefully to avoid overwriting needed data. This optimization maintains O(mn) time while reducing space to O(n) for the shorter sequence.

The choice between approaches depends on requirements. Memoization works better when many subproblems may remain unnecessary, such as when early termination conditions exist. Tabulation offers better performance for cases requiring most table entries, easier space optimization, and avoids stack overflow on large inputs. Space-optimized tabulation suits memory-constrained environments but complicates subsequence reconstruction, which requires the full table for backtracking.

Ruby Implementation

Ruby implementations of LCS leverage the language's expressive syntax for clear, maintainable code. The standard approach uses 2D arrays for tabulation with straightforward indexing.

def lcs_length(x, y)
  m, n = x.length, y.length
  dp = Array.new(m + 1) { Array.new(n + 1, 0) }
  
  (1..m).each do |i|
    (1..n).each do |j|
      if x[i - 1] == y[j - 1]
        dp[i][j] = dp[i - 1][j - 1] + 1
      else
        dp[i][j] = [dp[i - 1][j], dp[i][j - 1]].max
      end
    end
  end
  
  dp[m][n]
end

lcs_length("AGGTAB", "GXTXAYB")
# => 4

The implementation initializes a table with m+1 rows and n+1 columns, where row 0 and column 0 represent empty sequence comparisons with value 0. The nested iteration fills the table row by row, comparing characters using zero-based indexing (x[i-1] and y[j-1]) while table indices start from 1. This offset simplifies base case handling.

Reconstructing the actual subsequence requires backtracking through the table, following the decisions that built each value:

def lcs_string(x, y)
  m, n = x.length, y.length
  dp = Array.new(m + 1) { Array.new(n + 1, 0) }
  
  (1..m).each do |i|
    (1..n).each do |j|
      if x[i - 1] == y[j - 1]
        dp[i][j] = dp[i - 1][j - 1] + 1
      else
        dp[i][j] = [dp[i - 1][j], dp[i][j - 1]].max
      end
    end
  end
  
  # Backtrack to build the LCS string
  result = []
  i, j = m, n
  
  while i > 0 && j > 0
    if x[i - 1] == y[j - 1]
      result.unshift(x[i - 1])
      i -= 1
      j -= 1
    elsif dp[i - 1][j] > dp[i][j - 1]
      i -= 1
    else
      j -= 1
    end
  end
  
  result.join
end

lcs_string("AGGTAB", "GXTXAYB")
# => "GTAB"

Backtracking starts at dp[m][n] and moves toward dp[0][0]. When characters match, include that character in the result and move diagonally. When they differ, move in the direction of the larger value. The result builds in reverse order, requiring unshift to prepend characters or collecting in reverse for later reversal.

Ruby's functional programming capabilities enable memoized recursion with clean syntax:

class LCSMemoized
  def initialize(x, y)
    @x = x
    @y = y
    @memo = {}
  end
  
  def compute(i = @x.length, j = @y.length)
    return 0 if i == 0 || j == 0
    
    key = [i, j]
    return @memo[key] if @memo.key?(key)
    
    @memo[key] = if @x[i - 1] == @y[j - 1]
      1 + compute(i - 1, j - 1)
    else
      [compute(i - 1, j), compute(i, j - 1)].max
    end
  end
end

solver = LCSMemoized.new("AGGTAB", "GXTXAYB")
solver.compute
# => 4

The class encapsulates the sequences and memo hash, making the recursive method cleaner. The memoization key uses an array [i, j], which Ruby hashes correctly. The ternary-style if expression assigns the computed value directly to the memo entry.

Space-optimized tabulation reduces memory usage for large sequences:

def lcs_length_optimized(x, y)
  # Ensure x is the shorter sequence
  x, y = y, x if x.length > y.length
  
  m, n = x.length, y.length
  prev_row = Array.new(m + 1, 0)
  curr_row = Array.new(m + 1, 0)
  
  (1..n).each do |j|
    (1..m).each do |i|
      if x[i - 1] == y[j - 1]
        curr_row[i] = prev_row[i - 1] + 1
      else
        curr_row[i] = [prev_row[i], curr_row[i - 1]].max
      end
    end
    prev_row, curr_row = curr_row, prev_row
  end
  
  prev_row[m]
end

lcs_length_optimized("AGGTAB", "GXTXAYB")
# => 4

This implementation maintains only two rows instead of the full m×n table. After processing each row, it swaps the current and previous row references. The swap uses Ruby's parallel assignment for clean syntax. This reduces space from O(mn) to O(m), where m represents the shorter sequence length.

For sequences stored as arrays rather than strings, the comparison and character handling adapts naturally:

def lcs_array(arr1, arr2)
  m, n = arr1.length, arr2.length
  dp = Array.new(m + 1) { Array.new(n + 1, 0) }
  
  (1..m).each do |i|
    (1..n).each do |j|
      if arr1[i - 1] == arr2[j - 1]
        dp[i][j] = dp[i - 1][j - 1] + 1
      else
        dp[i][j] = [dp[i - 1][j], dp[i][j - 1]].max
      end
    end
  end
  
  # Reconstruct array LCS
  result = []
  i, j = m, n
  
  while i > 0 && j > 0
    if arr1[i - 1] == arr2[j - 1]
      result.unshift(arr1[i - 1])
      i -= 1
      j -= 1
    elsif dp[i - 1][j] > dp[i][j - 1]
      i -= 1
    else
      j -= 1
    end
  end
  
  result
end

lcs_array([1, 2, 3, 4, 5], [2, 3, 5, 7])
# => [2, 3, 5]

The array version works identically to the string version, returning an array of matching elements. Ruby's equality operator handles element comparison, supporting integers, strings, symbols, or custom objects that implement ==.

Performance Considerations

LCS computation has inherent time complexity O(mn) for sequences of length m and n. No known algorithm solves the general LCS problem faster than quadratic time. Space complexity varies from O(mn) for basic tabulation to O(min(m,n)) for optimized approaches.

The naive recursive approach exhibits exponential time O(2^(m+n)) due to redundant computation. Each call may branch into two recursive calls, creating a binary tree of depth m+n. The number of subproblems grows exponentially, making this approach practical only for sequences under 20-30 elements. A sequence pair of length 40 each could trigger trillions of recursive calls.

Dynamic programming reduces time to O(mn) by computing each subproblem once. For sequences of length 1000 each, this means 1 million table entries versus potentially 2^2000 recursive calls. The space requirement of O(mn) can become problematic for very long sequences. Two sequences of 100,000 elements each require a table with 10 billion entries, potentially exceeding available memory.

Space optimization reduces memory to O(min(m,n)) by maintaining only two rows (or columns) of the DP table. This trades the ability to reconstruct the actual subsequence for reduced memory usage. For length computation only, this optimization enables processing much longer sequences. Computing LCS length for two 1 million element sequences requires only 1 million memory cells instead of 1 trillion.

def benchmark_lcs_approaches(x, y)
  require 'benchmark'
  
  Benchmark.bm(20) do |bm|
    bm.report("Tabulation:") do
      lcs_length(x, y)
    end
    
    bm.report("Memoization:") do
      solver = LCSMemoized.new(x, y)
      solver.compute
    end
    
    bm.report("Space-optimized:") do
      lcs_length_optimized(x, y)
    end
  end
end

# Example output for moderately sized inputs:
# Tabulation:          0.180000   0.000000   0.180000
# Memoization:         0.240000   0.010000   0.250000  
# Space-optimized:     0.190000   0.000000   0.190000

Memoization typically runs slower than tabulation despite identical asymptotic complexity. The overhead of hash table operations, recursive function calls, and cache lookups adds constant factors. Tabulation uses simple array indexing with predictable memory access patterns, benefiting from CPU cache locality. The performance gap widens as sequence length increases.

Character comparison operations dominate execution time. For sequences with many matching characters, the algorithm performs more diagonal moves in the table. For completely dissimilar sequences, it performs more row/column moves. The distribution of matches and mismatches affects constant factors but not asymptotic complexity.

Memory access patterns impact performance significantly. Tabulation filling row by row exhibits good cache locality, accessing adjacent memory locations. Column-by-column filling can cause more cache misses depending on array layout in memory. Ruby's array implementation stores elements contiguously, favoring row-major access.

For very long sequences with high similarity, algorithms like Hunt-Szymanski can outperform dynamic programming, achieving O((r+n) log n) time where r represents the number of matched pairs. This approach suits scenarios like version control diffs where files have many identical lines. The algorithm becomes slower than DP when sequences have low similarity, making DP the safer general-purpose choice.

Parallelization offers limited benefits for LCS computation. The dependencies between table cells (each cell depends on three others) create serialization constraints. Diagonal computation allows some parallelization, processing cells on the same diagonal concurrently since they don't depend on each other. This approach complicates implementation significantly while providing modest speedups.

Practical Examples

Version control systems compute file differences by finding the LCS of file lines. Two versions of a file represent sequences of text lines, and the LCS identifies unchanged lines. Lines not in the LCS represent additions or deletions.

def compute_diff(original_lines, modified_lines)
  lcs = lcs_array(original_lines, modified_lines)
  lcs_set = Set.new(lcs.map.with_index { |line, idx| [line, idx] })
  
  diff = []
  lcs_idx = 0
  orig_idx = 0
  mod_idx = 0
  
  while orig_idx < original_lines.length || mod_idx < modified_lines.length
    if orig_idx < original_lines.length && 
       lcs_idx < lcs.length && 
       original_lines[orig_idx] == lcs[lcs_idx]
      # Line unchanged
      diff << [:unchanged, original_lines[orig_idx]]
      orig_idx += 1
      mod_idx += 1
      lcs_idx += 1
    elsif orig_idx < original_lines.length &&
          (mod_idx >= modified_lines.length ||
           original_lines[orig_idx] != lcs[lcs_idx])
      # Line deleted
      diff << [:deleted, original_lines[orig_idx]]
      orig_idx += 1
    else
      # Line added
      diff << [:added, modified_lines[mod_idx]]
      mod_idx += 1
    end
  end
  
  diff
end

original = ["line 1", "line 2", "line 3", "line 4"]
modified = ["line 1", "line 2a", "line 3", "line 5"]

compute_diff(original, modified)
# => [[:unchanged, "line 1"], [:deleted, "line 2"], [:added, "line 2a"],
#     [:unchanged, "line 3"], [:deleted, "line 4"], [:added, "line 5"]]

This implementation produces a basic diff output marking each line as unchanged, added, or deleted. Production diff tools add features like context lines, hunks, and more sophisticated change detection, but the core algorithm relies on LCS computation.

DNA sequence alignment uses LCS to find common subsequences between genetic sequences. Bioinformatics applications identify conserved regions across species or detect mutations. Each sequence represents a string of bases (A, C, G, T).

def dna_alignment(seq1, seq2)
  m, n = seq1.length, seq2.length
  dp = Array.new(m + 1) { Array.new(n + 1, 0) }
  
  (1..m).each do |i|
    (1..n).each do |j|
      if seq1[i - 1] == seq2[j - 1]
        dp[i][j] = dp[i - 1][j - 1] + 1
      else
        dp[i][j] = [dp[i - 1][j], dp[i][j - 1]].max
      end
    end
  end
  
  # Reconstruct alignment
  alignment1 = []
  alignment2 = []
  i, j = m, n
  
  while i > 0 || j > 0
    if i > 0 && j > 0 && seq1[i - 1] == seq2[j - 1]
      alignment1.unshift(seq1[i - 1])
      alignment2.unshift(seq2[j - 1])
      i -= 1
      j -= 1
    elsif j > 0 && (i == 0 || dp[i][j - 1] >= dp[i - 1][j])
      alignment1.unshift('-')
      alignment2.unshift(seq2[j - 1])
      j -= 1
    else
      alignment1.unshift(seq1[i - 1])
      alignment2.unshift('-')
      i -= 1
    end
  end
  
  {
    sequence1: alignment1.join,
    sequence2: alignment2.join,
    matches: alignment1.zip(alignment2).count { |a, b| a == b && a != '-' },
    similarity: alignment1.zip(alignment2).count { |a, b| a == b && a != '-' }.to_f / [m, n].max
  }
end

result = dna_alignment("AGGTCGA", "AGTTCG")
# => {:sequence1=>"AGGTCGA", :sequence2=>"AG-TTCG-", :matches=>5, :similarity=>0.71...}

This alignment visualization shows where sequences match and where gaps appear. The similarity metric indicates how closely related the sequences are. More sophisticated alignment algorithms like Smith-Waterman or Needleman-Wunsch extend LCS with gap penalties and scoring matrices.

Data synchronization between devices or systems uses LCS to identify minimal changes needed to transform one dataset into another. The algorithm determines which records to add, delete, or leave unchanged.

class DataSynchronizer
  def initialize(local_records, remote_records)
    @local = local_records
    @remote = remote_records
  end
  
  def compute_sync_operations
    # Find LCS of record IDs
    local_ids = @local.map { |r| r[:id] }
    remote_ids = @remote.map { |r| r[:id] }
    
    lcs = lcs_array(local_ids, remote_ids)
    lcs_set = Set.new(lcs)
    
    operations = []
    
    # Records to delete (in local but not in LCS)
    local_ids.each do |id|
      unless lcs_set.include?(id)
        operations << { action: :delete, id: id }
      end
    end
    
    # Records to add (in remote but not in LCS)
    remote_ids.each do |id|
      unless lcs_set.include?(id)
        record = @remote.find { |r| r[:id] == id }
        operations << { action: :add, record: record }
      end
    end
    
    operations
  end
end

local = [
  { id: 1, name: "Alice" },
  { id: 2, name: "Bob" },
  { id: 3, name: "Charlie" }
]

remote = [
  { id: 1, name: "Alice" },
  { id: 3, name: "Charlie" },
  { id: 4, name: "Diana" }
]

sync = DataSynchronizer.new(local, remote)
sync.compute_sync_operations
# => [{:action=>:delete, :id=>2}, {:action=>:add, :record=>{:id=>4, :name=>"Diana"}}]

This synchronization strategy minimizes changes by identifying records that already match. Only records not in the LCS require modification. This approach reduces data transfer and processing overhead in distributed systems.

Plagiarism detection compares document sentences or paragraphs to find copied content. The LCS length indicates how much content appears in both documents in the same order.

def detect_similarity(doc1_sentences, doc2_sentences)
  lcs_length = lcs_array(doc1_sentences, doc2_sentences).length
  max_length = [doc1_sentences.length, doc2_sentences.length].max
  
  {
    common_sentences: lcs_length,
    similarity_percentage: (lcs_length.to_f / max_length * 100).round(2),
    potential_plagiarism: lcs_length.to_f / max_length > 0.3
  }
end

doc1 = ["Introduction to algorithms.", "Dynamic programming solves optimization problems.", 
        "LCS is a classic example.", "Implementation requires careful analysis."]
doc2 = ["Dynamic programming solves optimization problems.", "LCS is a classic example.",
        "Many applications exist.", "Testing is important."]

detect_similarity(doc1, doc2)
# => {:common_sentences=>2, :similarity_percentage=>50.0, :potential_plagiarism=>true}

This basic plagiarism detector flags documents with high sequential overlap. Production systems add features like fuzzy matching, phrase-level comparison, and citation detection, but LCS provides the foundation for identifying ordered commonality.

Common Patterns

The standard tabulation pattern builds the DP table bottom-up, handling all edge cases through careful iteration bounds. This pattern applies to most LCS variations.

def standard_lcs_pattern(x, y)
  m, n = x.length, y.length
  dp = Array.new(m + 1) { Array.new(n + 1, 0) }
  
  (1..m).each do |i|
    (1..n).each do |j|
      dp[i][j] = if x[i - 1] == y[j - 1]
        dp[i - 1][j - 1] + 1
      else
        [dp[i - 1][j], dp[i][j - 1]].max
      end
    end
  end
  
  dp[m][n]
end

The path reconstruction pattern backtracks through the DP table to build the actual subsequence. This pattern requires the complete table and cannot work with space-optimized implementations.

def reconstruction_pattern(x, y, dp)
  result = []
  i, j = x.length, y.length
  
  while i > 0 && j > 0
    if x[i - 1] == y[j - 1]
      result.unshift(x[i - 1])
      i -= 1
      j -= 1
    elsif dp[i - 1][j] >= dp[i][j - 1]
      i -= 1
    else
      j -= 1
    end
  end
  
  result
end

The backtracking loop terminates when reaching row 0 or column 0. Movement direction depends on which neighbor cell has the larger value, with diagonal movement when characters match. Building the result with unshift maintains correct order without requiring reversal.

The space optimization pattern maintains only the current and previous rows, dramatically reducing memory for length-only computation.

def space_optimized_pattern(x, y)
  m, n = x.length, y.length
  prev = Array.new(n + 1, 0)
  curr = Array.new(n + 1, 0)
  
  m.times do |i|
    n.times do |j|
      curr[j + 1] = if x[i] == y[j]
        prev[j] + 1
      else
        [prev[j + 1], curr[j]].max
      end
    end
    prev, curr = curr, prev
  end
  
  prev[n]
end

After each row computation, swapping prev and curr reuses arrays instead of allocating new ones. This pattern reduces garbage collection pressure and memory overhead. The indexing shifts slightly to avoid index-by-one errors.

The memoization pattern caches recursive calls using a hash or 2D array. This pattern suits scenarios where not all subproblems require computation or when implementing within an existing recursive structure.

def memoization_pattern(x, y, i = nil, j = nil, memo = {})
  i ||= x.length - 1
  j ||= y.length - 1
  
  return 0 if i < 0 || j < 0
  return memo[[i, j]] if memo.key?([i, j])
  
  memo[[i, j]] = if x[i] == y[j]
    1 + memoization_pattern(x, y, i - 1, j - 1, memo)
  else
    [
      memoization_pattern(x, y, i - 1, j, memo),
      memoization_pattern(x, y, i, j - 1, memo)
    ].max
  end
end

This pattern initializes indices to sequence endpoints and decrements during recursion. The memo hash uses arrays as keys, relying on Ruby's array equality. Default parameters simplify the initial call while allowing the memo to persist across recursive calls.

The early termination pattern detects when no longer subsequence exists without completing full computation. This pattern applies when checking if LCS length exceeds a threshold.

def lcs_exceeds_threshold?(x, y, threshold)
  m, n = x.length, y.length
  dp = Array.new(m + 1) { Array.new(n + 1, 0) }
  
  (1..m).each do |i|
    (1..n).each do |j|
      dp[i][j] = if x[i - 1] == y[j - 1]
        dp[i - 1][j - 1] + 1
      else
        [dp[i - 1][j], dp[i][j - 1]].max
      end
      
      return true if dp[i][j] > threshold
    end
  end
  
  false
end

Checking the threshold after each cell update allows returning immediately when the condition satisfies. This optimization saves computation when the threshold is low and sequences have early matches.

Reference

Algorithm Complexity

Metric Naive Recursion Memoization Tabulation Space-Optimized
Time Complexity O(2^(m+n)) O(mn) O(mn) O(mn)
Space Complexity O(m+n) O(mn) O(mn) O(min(m,n))
Reconstruction Yes Yes Yes No
Cache Locality Poor Medium Good Good

Core Recurrence Relations

Case Condition Formula
Base Case i = 0 or j = 0 LCS = 0
Match X[i] = Y[j] LCS[i][j] = LCS[i-1][j-1] + 1
Mismatch X[i] ≠ Y[j] LCS[i][j] = max(LCS[i-1][j], LCS[i][j-1])

Implementation Decision Matrix

Requirement Recommended Approach Rationale
Length only Space-optimized tabulation Minimal memory, fast
Reconstruct sequence Standard tabulation Requires full table for backtracking
Memory constrained Space-optimized or memoization Reduced space usage
Large sequences Space-optimized tabulation Best memory/performance balance
Partial computation Memoization Computes only needed subproblems
Simple implementation Standard tabulation Straightforward, predictable

Common Variations

Variation Description Complexity Change
Multiple sequences LCS of 3+ sequences O(m × n × p × ...)
Weighted LCS Characters have different match values Same time, different scoring
Constrained LCS Additional matching constraints Problem-dependent
Approximate LCS Allow some mismatches Depends on approximation

Ruby-Specific Considerations

Aspect Recommendation Example
Array initialization Use Array.new with block Array.new(n) { Array.new(m, 0) }
Character comparison Use bracket notation for strings x[i] == y[j]
Hash memoization Use arrays as keys memo[[i, j]] = value
Swapping variables Use parallel assignment prev, curr = curr, prev
Range iteration Use Range#each for clarity (1..m).each do end

Backtracking Movements

Condition Movement Action
X[i] = Y[j] Diagonal (i-1, j-1) Include character
DP[i-1][j] > DP[i][j-1] Up (i-1, j) Skip X[i]
DP[i][j-1] > DP[i-1][j] Left (i, j-1) Skip Y[j]
DP[i-1][j] = DP[i][j-1] Either direction Either sequence works

Edge Cases

Case Handling Expected Result
Empty sequences Return 0 or empty LCS length 0
Identical sequences Return the sequence LCS equals input
No common elements Return 0 or empty LCS length 0
Single character Compare directly 0 or 1
One empty sequence Return 0 or empty LCS length 0

Performance Optimization Checklist

Optimization Impact Trade-off
Space-optimize to two rows O(mn) to O(min(m,n)) space Cannot reconstruct sequence
Use shorter sequence for columns Reduce space by factor of min/max Requires length comparison
Memoize with hash Avoid full table allocation Hash overhead, unpredictable memory
Early termination Reduce computation time Problem-specific
Cache-aware iteration Better CPU cache usage Row-major access pattern