CrackedRuby CrackedRuby

Best, Average, and Worst Case

Overview

Best, average, and worst case analysis describes the performance characteristics of algorithms under different input conditions. Each case represents a distinct scenario: best case measures performance with the most favorable input, worst case examines the most unfavorable input, and average case evaluates expected performance across all possible inputs weighted by probability.

This analysis framework originated from the need to characterize algorithm behavior beyond single data points. An algorithm that performs well on sorted data may degrade on reverse-sorted data. Best, average, and worst case analysis captures this variability and provides bounds for performance expectations.

The three cases apply to both time complexity (operations required) and space complexity (memory consumed). Time complexity analysis dominates discussions because time often represents the primary performance bottleneck, though space analysis follows identical principles.

Consider a simple search through an unsorted array for a target value:

def linear_search(array, target)
  array.each_with_index do |element, index|
    return index if element == target
  end
  nil
end

array = [5, 2, 8, 1, 9]
linear_search(array, 5)  # => 0 (found immediately)
linear_search(array, 9)  # => 4 (found at end)
linear_search(array, 7)  # => nil (not found, searched entire array)

The first call represents best case (target at position 0), the last call represents worst case (target absent, requiring full traversal), and typical usage falls somewhere between these extremes.

Key Principles

Best Case Complexity represents the minimum resources an algorithm requires across all possible inputs. This scenario occurs when input characteristics align optimally with the algorithm's operation. Best case provides a lower bound but rarely reflects typical performance. Algorithms with excellent best case but poor average or worst case often mislead developers during initial evaluation.

Worst Case Complexity represents the maximum resources an algorithm requires. This scenario occurs when input characteristics create the most unfavorable conditions. Worst case analysis provides guarantees: performance never exceeds this bound. Systems requiring predictable response times prioritize worst case analysis because it establishes an upper limit for resource consumption.

Average Case Complexity represents expected performance across all possible inputs, weighted by their probability of occurrence. This analysis requires knowledge of input distribution, making it more complex than best or worst case. Average case most accurately predicts real-world performance when input characteristics match assumed distributions.

The relationship between cases varies by algorithm. Some algorithms exhibit identical complexity across all cases. Others show dramatic differences:

# Insertion sort on different inputs
def insertion_sort(array)
  (1...array.length).each do |i|
    key = array[i]
    j = i - 1
    while j >= 0 && array[j] > key
      array[j + 1] = array[j]
      j -= 1
    end
    array[j + 1] = key
  end
  array
end

# Best case: already sorted [1, 2, 3, 4, 5]
# Each element compares once, no shifts
# O(n) time

# Worst case: reverse sorted [5, 4, 3, 2, 1]
# Each element compares and shifts through all previous elements
# O(n²) time

# Average case: random order
# Each element compares/shifts through roughly half of previous elements
# O(n²) time

Amortized Analysis extends these concepts by considering sequences of operations rather than individual operations. An operation with poor worst case on individual calls may have excellent amortized complexity across multiple calls. Ruby's Array resizing demonstrates this: appending to a full array triggers reallocation (expensive), but the frequency of reallocation decreases as the array grows, resulting in O(1) amortized append time despite occasional O(n) worst case operations.

Big O Notation typically expresses worst case complexity by convention. When developers state "quicksort is O(n log n)," they often mean average case, since quicksort's worst case reaches O(n²). This ambiguity requires explicit clarification in performance discussions.

Input characteristics determining case scenarios include:

  • Data ordering: Sorted, reverse sorted, random, partially sorted
  • Data distribution: Uniform, skewed, clustered, bimodal
  • Data size: Small, large, empty
  • Data content: Duplicates present, all unique, all identical
  • Data structure: Balanced trees, degenerate trees, cyclic graphs

The mathematical foundations distinguish between these cases through probability distributions. Best case has probability approaching 0 for large inputs (only specific arrangements satisfy optimal conditions). Worst case also has low probability but represents the guaranteed upper bound. Average case integrates over all possible inputs weighted by probability, requiring careful definition of the input space and probability model.

Ruby Implementation

Ruby provides tools for empirical analysis of best, average, and worst case behavior. The Benchmark module measures execution time across different input scenarios:

require 'benchmark'

def analyze_cases(algorithm, n)
  # Best case: sorted array
  best_input = (1..n).to_a
  
  # Worst case: reverse sorted array
  worst_input = (1..n).to_a.reverse
  
  # Average case: randomized array
  average_input = (1..n).to_a.shuffle
  
  Benchmark.bm(10) do |x|
    x.report("best:") { algorithm.call(best_input.dup) }
    x.report("worst:") { algorithm.call(worst_input.dup) }
    x.report("average:") { algorithm.call(average_input.dup) }
  end
end

# Test insertion sort
analyze_cases(->(arr) { insertion_sort(arr) }, 1000)
#                 user     system      total        real
# best:      0.000234  0.000000   0.000234 (  0.000229)
# worst:     0.045621  0.000000   0.045621 (  0.045678)
# average:   0.023456  0.000000   0.023456 (  0.023489)

Ruby's standard library algorithms exhibit predictable case characteristics. Array#sort uses a hybrid algorithm (Timsort) with O(n) best case on sorted data, O(n log n) worst case, and O(n log n) average case:

require 'benchmark'

n = 100_000

sorted = (1..n).to_a
random = (1..n).to_a.shuffle
reverse = (1..n).to_a.reverse

Benchmark.bm(15) do |x|
  x.report("sorted:") { sorted.dup.sort }
  x.report("random:") { random.dup.sort }
  x.report("reverse:") { reverse.dup.sort }
end
# sorted runs faster due to O(n) best case detection

Hash lookups in Ruby demonstrate consistent O(1) average case across different load factors, though worst case can degrade to O(n) with hash collisions:

# Analyze hash lookup performance
def hash_lookup_analysis(size)
  hash = {}
  
  # Populate hash
  size.times { |i| hash[i] = i }
  
  # Best/Average case: good hash distribution
  time_start = Time.now
  10_000.times { hash[rand(size)] }
  normal_time = Time.now - time_start
  
  # Worst case: force hash collisions (demonstration only)
  collision_hash = {}
  
  # Create keys that hash to same bucket (implementation-dependent)
  collision_keys = Array.new(size) { |i| "key#{i}" }
  collision_keys.each { |k| collision_hash[k] = k }
  
  time_start = Time.now
  10_000.times { collision_hash[collision_keys.sample] }
  collision_time = Time.now - time_start
  
  { normal: normal_time, collision: collision_time }
end

results = hash_lookup_analysis(1000)
# Normal case maintains O(1) behavior
# Pathological cases (intentional collisions) degrade performance

Binary search implementation shows dramatic differences between cases:

def binary_search(array, target)
  left = 0
  right = array.length - 1
  
  while left <= right
    mid = (left + right) / 2
    
    return mid if array[mid] == target
    
    if array[mid] < target
      left = mid + 1
    else
      right = mid - 1
    end
  end
  
  nil
end

sorted_array = (1..1000).to_a

# Best case: target at middle position
middle_element = sorted_array[500]
binary_search(sorted_array, middle_element)  # Found in 1 comparison

# Worst case: target at extreme position or absent
binary_search(sorted_array, 1)     # Requires log₂(1000) ≈ 10 comparisons
binary_search(sorted_array, 1001)  # Requires log₂(1000) ≈ 10 comparisons

# Average case: random target position
# Expected log₂(n) - 1 comparisons

Ruby's Enumerable methods exhibit varying case characteristics. The find method demonstrates O(1) best case, O(n) worst case:

array = (1..10_000).to_a

# Best case: first element matches
Benchmark.measure { array.find { |x| x == 1 } }

# Worst case: last element matches or no match
Benchmark.measure { array.find { |x| x == 10_000 } }
Benchmark.measure { array.find { |x| x == 10_001 } }

Algorithms with consistent complexity across all cases include array access and hash set operations (average case):

# Array access: O(1) all cases
array = Array.new(1_000_000) { rand }
array[0]        # Best, average, worst all O(1)
array[500_000]  # Same performance
array[-1]       # Same performance

# Set membership: O(1) average case regardless of input
require 'set'
set = Set.new(1..1_000_000)
set.include?(1)        # O(1) average
set.include?(500_000)  # O(1) average
set.include?(1_000_000) # O(1) average

Practical Examples

Consider a function that finds duplicate elements in an array. Different implementations exhibit different case characteristics:

# Approach 1: Nested loops (brute force)
def find_duplicates_nested(array)
  duplicates = []
  
  (0...array.length).each do |i|
    ((i+1)...array.length).each do |j|
      if array[i] == array[j] && !duplicates.include?(array[i])
        duplicates << array[i]
      end
    end
  end
  
  duplicates
end

# Best case: No duplicates or all elements identical
# Still O(n²) comparisons required
no_dups = [1, 2, 3, 4, 5]
all_same = [1, 1, 1, 1, 1]

# Worst case: Many scattered duplicates
# O(n²) comparisons plus duplicate checks
worst = [1, 2, 1, 3, 2, 4, 3, 5, 4]

# Average case: O(n²) regardless of input
# Approach 2: Hash-based tracking
def find_duplicates_hash(array)
  seen = {}
  duplicates = []
  
  array.each do |element|
    if seen[element]
      duplicates << element unless duplicates.include?(element)
    else
      seen[element] = true
    end
  end
  
  duplicates
end

# Best case: No duplicates
# O(n) time, O(n) space
no_dups = [1, 2, 3, 4, 5]

# Worst case: All duplicates
# O(n) time, O(n) space
all_dups = [1, 1, 1, 1, 1]

# Average case: O(n) time, O(n) space
# Consistent performance regardless of distribution

Quicksort demonstrates classic case variation based on pivot selection:

def quicksort(array)
  return array if array.length <= 1
  
  # Pivot selection determines case behavior
  pivot = array[0]  # First element as pivot
  
  less = array[1..-1].select { |x| x <= pivot }
  greater = array[1..-1].select { |x| x > pivot }
  
  quicksort(less) + [pivot] + quicksort(greater)
end

# Best case: Pivot divides array evenly each time
# Occurs with median pivot or random pivots on random data
# O(n log n) time
balanced = [5, 2, 8, 1, 9, 3, 7, 4, 6]

# Worst case: Pivot always minimum or maximum
# Occurs with first/last element pivot on sorted data
# O(n²) time
sorted = [1, 2, 3, 4, 5, 6, 7, 8, 9]
reverse = [9, 8, 7, 6, 5, 4, 3, 2, 1]

# Average case: Random pivot selection
# O(n log n) expected time
random_array = (1..1000).to_a.shuffle

String matching algorithms show input-dependent performance:

def naive_string_match(text, pattern)
  matches = []
  
  (0..text.length - pattern.length).each do |i|
    match = true
    pattern.length.times do |j|
      if text[i + j] != pattern[j]
        match = false
        break
      end
    end
    matches << i if match
  end
  
  matches
end

text = "a" * 10_000 + "b"
pattern_best = "b"      # Found quickly at end
pattern_worst = "a" * 99 + "b"  # Many partial matches before failure
pattern_avg = "abc"     # Typical pattern

# Best case: Pattern distinct from text
# Mismatches detected immediately: O(n)

# Worst case: Many partial matches
# Each position requires full pattern comparison: O(n*m)

# Average case: O(n*m) with early rejection

Cache behavior affects practical case analysis. Sequential array access exhibits best case cache performance while random access shows worst case:

require 'benchmark'

array = Array.new(10_000_000) { rand }

# Best case: Sequential access (cache-friendly)
sequential_sum = 0
Benchmark.measure do
  array.each { |x| sequential_sum += x }
end
# => Fast due to cache prefetching

# Worst case: Random access (cache-unfriendly)
indices = (0...array.length).to_a.shuffle
random_sum = 0
Benchmark.measure do
  indices.each { |i| random_sum += array[i] }
end
# => Slower due to cache misses

Tree operations depend on tree balance. Searching a balanced binary search tree differs dramatically from searching a degenerate tree:

class Node
  attr_accessor :value, :left, :right
  
  def initialize(value)
    @value = value
    @left = nil
    @right = nil
  end
end

def search_tree(node, target)
  return nil if node.nil?
  return node if node.value == target
  
  if target < node.value
    search_tree(node.left, target)
  else
    search_tree(node.right, target)
  end
end

# Best case: Balanced tree
# O(log n) search time
balanced_root = Node.new(5)
balanced_root.left = Node.new(3)
balanced_root.right = Node.new(7)
balanced_root.left.left = Node.new(1)
balanced_root.left.right = Node.new(4)

# Worst case: Degenerate tree (linked list)
# O(n) search time
degenerate_root = Node.new(1)
current = degenerate_root
(2..1000).each do |i|
  current.right = Node.new(i)
  current = current.right
end

search_tree(balanced_root, 7)    # O(log n)
search_tree(degenerate_root, 1000)  # O(n)

Common Patterns

Early Exit Pattern: Algorithms that can terminate before processing all input exhibit best case performance when exit conditions trigger immediately:

def all_elements_positive?(array)
  array.each do |element|
    return false if element <= 0
  end
  true
end

# Best case: First element negative - O(1)
[0, 1, 2, 3, 4]

# Worst case: All positive or last negative - O(n)
[1, 2, 3, 4, 5]
[1, 2, 3, 4, -1]

Preprocessing Pattern: Algorithms that invest in preprocessing to improve query performance exhibit poor initial case but excellent subsequent cases:

class IndexedSearcher
  def initialize(array)
    # Preprocessing: O(n) time and space
    @array = array
    @index = {}
    array.each_with_index do |element, i|
      @index[element] ||= []
      @index[element] << i
    end
  end
  
  def find_all(target)
    # Best, average, worst case: O(1) after preprocessing
    @index[target] || []
  end
end

searcher = IndexedSearcher.new([1, 2, 3, 2, 4, 2, 5])
searcher.find_all(2)  # => [1, 3, 5] in O(1) time

Adaptive Algorithm Pattern: Algorithms that adjust behavior based on input characteristics optimize for detected patterns:

def adaptive_sort(array)
  return array if array.length <= 1
  
  # Check if already sorted - O(n) scan
  sorted = array.each_cons(2).all? { |a, b| a <= b }
  return array if sorted  # Best case: O(n)
  
  # Check if reverse sorted - O(n) scan
  reverse_sorted = array.each_cons(2).all? { |a, b| a >= b }
  return array.reverse if reverse_sorted  # O(n)
  
  # Fall back to general sort - O(n log n)
  array.sort
end

adaptive_sort([1, 2, 3, 4, 5])      # O(n) best case
adaptive_sort([5, 4, 3, 2, 1])      # O(n) reverse case
adaptive_sort([3, 1, 4, 1, 5, 9])   # O(n log n) average case

Probabilistic Best Case Pattern: Randomized algorithms achieve expected best case through random choices:

def randomized_quicksort(array)
  return array if array.length <= 1
  
  # Random pivot selection avoids worst case on sorted input
  pivot_index = rand(array.length)
  pivot = array[pivot_index]
  
  less = array.select { |x| x < pivot }
  equal = array.select { |x| x == pivot }
  greater = array.select { |x| x > pivot }
  
  randomized_quicksort(less) + equal + randomized_quicksort(greater)
end

# Best case: Lucky pivot selections - O(n log n)
# Worst case: Unlucky pivots - O(n²)
# Average case: O(n log n) expected time
# Probability of worst case extremely low with random pivots

Batch Processing Pattern: Amortized analysis reveals that operations expensive individually become cheap when analyzed in batches:

class DynamicArray
  def initialize
    @array = Array.new(1)
    @size = 0
  end
  
  def append(element)
    # Worst case single operation: O(n) when resizing
    if @size >= @array.length
      new_array = Array.new(@array.length * 2)
      @array.each_with_index { |e, i| new_array[i] = e }
      @array = new_array
    end
    
    @array[@size] = element
    @size += 1
  end
  
  # Amortized analysis across n appends:
  # Total cost: n + (n/2 + n/4 + n/8 + ...) < 2n
  # Amortized cost per operation: O(1)
end

da = DynamicArray.new
1000.times { |i| da.append(i) }  # O(1) amortized per append

Input-Sensitive Pattern: Algorithms whose performance depends on specific input properties rather than just size:

def merge_sorted_arrays(arr1, arr2)
  result = []
  i = j = 0
  
  while i < arr1.length && j < arr2.length
    if arr1[i] <= arr2[j]
      result << arr1[i]
      i += 1
    else
      result << arr2[j]
      j += 1
    end
  end
  
  result + arr1[i..-1] + arr2[j..-1]
end

# Best case: One array exhausted early
# [1, 2, 3], [10, 11, 12] requires minimal comparisons

# Worst case: Elements interleaved
# [1, 3, 5, 7], [2, 4, 6, 8] requires all comparisons

# Performance depends on value distribution, not just length

Performance Considerations

Understanding case complexity guides optimization decisions. Optimizing for best case rarely improves real-world performance because best case scenarios occur infrequently. Optimizing for worst case provides guarantees but may sacrifice average case performance. Most production systems target average case optimization while maintaining acceptable worst case bounds.

Profiling reveals actual case behavior in production. Ruby's benchmark-ips gem measures operations per second across different inputs:

require 'benchmark/ips'

def measure_sorting_cases
  n = 1000
  sorted = (1..n).to_a
  random = (1..n).to_a.shuffle
  reverse = (1..n).to_a.reverse
  
  Benchmark.ips do |x|
    x.report("sorted") { sorted.dup.sort }
    x.report("random") { random.dup.sort }
    x.report("reverse") { reverse.dup.sort }
    x.compare!
  end
end

# Results show relative performance across cases
# Identifies which cases dominate real workloads

Space complexity cases mirror time complexity analysis. Recursive algorithms consume stack space proportional to recursion depth:

def factorial_recursive(n)
  return 1 if n <= 1
  n * factorial_recursive(n - 1)
end

# Space complexity: O(n) for all cases
# Call stack depth equals n regardless of input characteristics

def factorial_iterative(n)
  result = 1
  (2..n).each { |i| result *= i }
  result
end

# Space complexity: O(1) for all cases
# No recursive calls, constant stack space

Cache-aware algorithm design considers memory access patterns. Matrix multiplication exhibits different cache behavior based on traversal order:

# Cache-unfriendly: Column-major access
def multiply_matrices_slow(a, b)
  result = Array.new(a.length) { Array.new(b[0].length, 0) }
  
  a.length.times do |i|
    b[0].length.times do |j|
      b.length.times do |k|
        result[i][j] += a[i][k] * b[k][j]
      end
    end
  end
  
  result
end

# Cache-friendly: Block-based access
def multiply_matrices_fast(a, b, block_size = 64)
  # Tiled multiplication for better cache utilization
  # Implementation omitted for brevity
  # Average case faster due to cache hits
end

Parallel processing affects case analysis. Embarrassingly parallel algorithms maintain their complexity class but reduce wall-clock time:

require 'parallel'

def parallel_search(array, target)
  # Divide array into chunks for parallel processing
  chunk_size = array.length / Parallel.processor_count
  
  result = Parallel.map(array.each_slice(chunk_size)) do |chunk|
    chunk.index(target)
  end
  
  # Best case: Found in first chunk examined
  # Worst case: Not found, all chunks searched
  # Parallel execution reduces wall-clock time but not operation count
end

Database query optimization applies case analysis to index usage. Indexed lookups exhibit O(log n) time while full table scans require O(n) time:

# Simulated database query performance
class TableScanner
  def initialize(records)
    @records = records
    @indexed_fields = {}
  end
  
  def create_index(field)
    # Preprocessing: O(n log n)
    @indexed_fields[field] = @records.sort_by { |r| r[field] }
  end
  
  def query_with_index(field, value)
    # Best/Average/Worst: O(log n) with index
    indexed = @indexed_fields[field]
    return nil unless indexed
    
    # Binary search on indexed field
    left, right = 0, indexed.length - 1
    while left <= right
      mid = (left + right) / 2
      return indexed[mid] if indexed[mid][field] == value
      
      if indexed[mid][field] < value
        left = mid + 1
      else
        right = mid - 1
      end
    end
    nil
  end
  
  def query_without_index(field, value)
    # O(n) for all cases - full table scan
    @records.find { |r| r[field] == value }
  end
end

Network latency introduces practical case variation independent of algorithmic complexity. A distributed algorithm may exhibit O(1) computational complexity but variable latency based on network conditions, cache hits, or geographic distribution.

Reference

Complexity Classes by Case

Complexity Best Case Example Worst Case Example Average Case Example
O(1) Array access at any index Array access at any index Hash table lookup
O(log n) Binary search finds middle element Binary search target at extremes Binary search random target
O(n) Linear search finds first element Linear search finds last element Linear search random position
O(n log n) Timsort on sorted data Merge sort on any input Quicksort with random pivots
O(n²) Bubble sort on sorted data Bubble sort on reverse data Selection sort on any data

Common Algorithm Case Characteristics

Algorithm Best Case Average Case Worst Case
Binary Search O(1) O(log n) O(log n)
Linear Search O(1) O(n) O(n)
Insertion Sort O(n) O(n²) O(n²)
Merge Sort O(n log n) O(n log n) O(n log n)
Quicksort O(n log n) O(n log n) O(n²)
Heap Sort O(n log n) O(n log n) O(n log n)
Bubble Sort O(n) O(n²) O(n²)
Selection Sort O(n²) O(n²) O(n²)
Hash Table Insert O(1) O(1) O(n)
Hash Table Lookup O(1) O(1) O(n)
BST Search O(log n) O(log n) O(n)
BST Insert O(log n) O(log n) O(n)

Ruby Data Structure Operations

Operation Data Structure Best Case Average Case Worst Case
Access Array O(1) O(1) O(1)
Search Array O(1) O(n) O(n)
Insert Array at end O(1) amortized O(1) amortized O(n)
Insert Array at position O(n) O(n) O(n)
Delete Array O(n) O(n) O(n)
Access Hash N/A O(1) O(n)
Search Hash O(1) O(1) O(n)
Insert Hash O(1) O(1) O(n)
Delete Hash O(1) O(1) O(n)
Search Set O(1) O(1) O(n)
Insert Set O(1) O(1) O(n)

Input Characteristics Determining Cases

Characteristic Impact on Performance Examples
Data ordering Affects comparison-based algorithms Sorted vs random vs reverse sorted
Element distribution Affects divide-and-conquer balance Uniform vs skewed distributions
Duplicate presence Affects hash table performance All unique vs many duplicates
Data size Affects cache behavior Small fits in cache vs large datasets
Search target location Affects search termination Beginning vs end vs middle
Tree balance Affects tree operation depth Balanced vs degenerate trees
Hash distribution Affects collision frequency Good hash vs collision-prone

Case Analysis Decision Framework

Scenario Prioritize Reasoning
Real-time systems Worst case Response time guarantees required
Batch processing Average case Total throughput matters most
Interactive applications Average case Typical user experience critical
Safety-critical systems Worst case Failure scenarios must be bounded
Competitive algorithms Average case Expected performance on typical inputs
Resource-constrained systems Worst case Memory/time limits absolute
Caching systems Best case after warmup Optimize common path after preprocessing

Benchmark Patterns

Pattern Ruby Implementation Use Case
Multiple inputs Benchmark.bm with different datasets Compare cases empirically
Statistical sampling Run algorithm on randomized inputs Estimate average case
Worst case generation Construct pathological inputs Verify worst case bounds
Profiling ruby-prof or stackprof Identify bottlenecks per case
Memory tracking memory_profiler gem Analyze space complexity cases