CrackedRuby - Memoization

Overview

Memoization is an optimization technique that trades memory for speed by caching the results of function calls. When a memoized function executes with a particular set of arguments, it stores the result. Subsequent calls with the same arguments return the cached result instead of recalculating, eliminating redundant computation.

The technique derives its name from the Latin word "memorandum" (to be remembered). Computer scientist Donald Michie coined the term in 1968 to describe a specific form of caching applied to function results.

Memoization applies to pure functions—functions that always return the same output for the same input without side effects. The optimization makes the most impact when functions perform expensive computations, make external API calls, or execute recursive algorithms with overlapping subproblems.

# Without memoization - calculates every time
def fibonacci(n)
  return n if n <= 1
  fibonacci(n - 1) + fibonacci(n - 2)
end

fibonacci(35)  # Takes seconds to complete

# With memoization - caches results
def fibonacci_memo(n, cache = {})
  return n if n <= 1
  cache[n] ||= fibonacci_memo(n - 1, cache) + fibonacci_memo(n - 2, cache)
end

fibonacci_memo(35)  # Returns instantly

The performance difference between these implementations becomes dramatic as input size grows. The unmemoized version makes millions of redundant calculations, while the memoized version computes each unique value once.

Key Principles

Memoization operates on three fundamental principles: caching, lookup, and return. When a function executes, it first checks whether a cached result exists for the current arguments. If a cache hit occurs, the function returns the stored value immediately. On a cache miss, the function performs the computation, stores the result in the cache, and returns the value.

The caching mechanism requires a data structure that maps inputs to outputs. Hash tables provide optimal lookup performance with O(1) average-case complexity, making them the standard choice for memoization caches. The input arguments serve as keys, and the computed results serve as values.

Cache Key Generation

Creating effective cache keys requires converting function arguments into a unique, hashable identifier. Simple cases use single arguments directly as keys. Complex cases combine multiple arguments into composite keys.

# Single argument - direct key
cache[n]

# Multiple arguments - composite key
cache[[x, y, z]]

# Named arguments - convert to sorted array or hash
cache[options.sort.to_h]

Function Purity Requirement

Memoization maintains correctness only for pure functions. A pure function satisfies two conditions: it produces identical output for identical input (referential transparency), and it causes no observable side effects. Functions that modify global state, perform I/O operations, or depend on external mutable state violate purity and produce incorrect results when memoized.

# Pure function - safe to memoize
def calculate_tax(amount, rate)
  amount * rate
end

# Impure function - unsafe to memoize
def get_current_price(product_id)
  api_client.fetch_price(product_id)  # External state changes
end

Space-Time Tradeoff

Memoization exemplifies the classic space-time tradeoff in computer science. The technique reduces execution time by consuming additional memory to store cached results. The tradeoff becomes favorable when computation cost exceeds storage cost and when the same inputs recur frequently.

The cache size grows proportionally to the number of unique input combinations encountered. Functions with large input domains or infrequent argument repetition may consume excessive memory without delivering performance benefits.

Ruby Implementation

Ruby provides multiple approaches to implementing memoization, from simple instance variable caching to sophisticated library-based solutions. The language's syntax and conventions make certain patterns particularly idiomatic.

The ||= Operator Pattern

The most common Ruby memoization pattern uses the conditional assignment operator (||=). This operator assigns a value to a variable only if the variable is nil or false.

class Calculator
  def expensive_operation
    @result ||= begin
      # Complex calculation
      sleep(2)
      42
    end
  end
end

calc = Calculator.new
calc.expensive_operation  # Takes 2 seconds
calc.expensive_operation  # Returns immediately

The pattern works by storing the result in an instance variable. On first execution, the instance variable is nil, so the calculation runs and assigns the result. Subsequent calls find the instance variable already set and return its value without re-executing the calculation block.

This pattern has a critical limitation: it cannot distinguish between a cached nil/false value and an uncached value. If the expensive operation legitimately returns nil or false, the memoization breaks.

# Broken memoization for nil/false results
def find_user(id)
  @user ||= database.find(id)  # Breaks if user doesn't exist (returns nil)
end

# First call: user not found, returns nil, doesn't cache
# Second call: @user still nil, queries database again

Hash-Based Memoization

Hash-based caching solves the nil/false problem by using explicit key presence checks. The pattern uses a hash to store results and checks whether the hash contains a key before computing.

class Calculator
  def initialize
    @cache = {}
  end

  def factorial(n)
    return @cache[n] if @cache.key?(n)
    
    @cache[n] = if n <= 1
      1
    else
      n * factorial(n - 1)
    end
  end
end

This approach correctly handles any return value, including nil and false. The key presence check distinguishes between "no cached value" and "cached nil value."

Multi-Argument Memoization

Methods accepting multiple arguments require composite cache keys. Ruby's array comparison makes array-based keys straightforward.

class GeometryCalculator
  def initialize
    @cache = {}
  end

  def distance(x1, y1, x2, y2)
    key = [x1, y1, x2, y2]
    return @cache[key] if @cache.key?(key)
    
    @cache[key] = Math.sqrt((x2 - x1)**2 + (y2 - y1)**2)
  end
end

Class-Level Memoization

Class methods and singleton methods require class instance variables to maintain separate caches.

class Configuration
  def self.settings
    @settings ||= begin
      # Load from file or API
      YAML.load_file('config.yml')
    end
  end
end

Class instance variables (@settings in this case) belong to the class object itself, not instances of the class. Each class maintains its own set of class instance variables.

Thread-Safe Memoization

The basic ||= pattern is not thread-safe. Multiple threads executing simultaneously can trigger multiple calculations before any thread completes the caching.

require 'concurrent'

class ThreadSafeCalculator
  def initialize
    @cache = Concurrent::Map.new
  end

  def expensive_calculation(n)
    @cache.fetch_or_store(n) do
      sleep(1)  # Simulate expensive work
      n * 2
    end
  end
end

The concurrent-ruby gem provides thread-safe data structures. The Concurrent::Map#fetch_or_store method atomically checks for a key and computes the value if absent, preventing race conditions.

Memoist Gem

The memoist gem provides a declarative syntax for memoization through method decoration.

require 'memoist'

class Calculator
  extend Memoist

  def fibonacci(n)
    return n if n <= 1
    fibonacci(n - 1) + fibonacci(n - 2)
  end
  memoize :fibonacci
end

The memoize declaration wraps the method with caching logic, separating the business logic from the optimization concern. The gem handles edge cases, thread safety options, and cache invalidation.

Practical Examples

Recursive Algorithm Optimization

The Fibonacci sequence calculation demonstrates memoization's impact on recursive algorithms with overlapping subproblems. The naive recursive implementation exhibits exponential time complexity due to redundant calculations.

class FibonacciCalculator
  def initialize
    @cache = {}
  end

  def calculate(n)
    return @cache[n] if @cache.key?(n)
    
    @cache[n] = if n <= 1
      n
    else
      calculate(n - 1) + calculate(n - 2)
    end
  end

  def cache_size
    @cache.size
  end
end

fib = FibonacciCalculator.new
fib.calculate(40)     # Returns 102334155
fib.cache_size        # Returns 41 (cached 0 through 40)

The memoized version reduces time complexity from O(2^n) to O(n) by computing each number once. For n=40, the unmemoized version makes over 200 million calls, while the memoized version makes only 41.

Database Query Caching

Applications frequently make repeated database queries for the same data within a request lifecycle. Memoization eliminates redundant queries.

class User
  def self.find_with_cache(id)
    @user_cache ||= {}
    return @user_cache[id] if @user_cache.key?(id)
    
    @user_cache[id] = find(id)
  end

  def roles
    @roles ||= database.query("SELECT * FROM roles WHERE user_id = ?", id)
  end

  def permissions
    @permissions ||= roles.flat_map(&:permissions).uniq
  end
end

The pattern prevents the N+1 query problem when accessing the same user multiple times. The cache persists for the class lifetime, but production systems typically need request-scoped caching to prevent stale data.

API Response Caching

External API calls introduce latency and may have rate limits. Memoization reduces API calls for duplicate requests.

class WeatherService
  def initialize
    @cache = {}
    @cache_duration = 300  # 5 minutes
  end

  def current_temperature(city)
    cached = @cache[city]
    if cached && Time.now - cached[:timestamp] < @cache_duration
      return cached[:value]
    end

    temp = fetch_from_api(city)
    @cache[city] = { value: temp, timestamp: Time.now }
    temp
  end

  private

  def fetch_from_api(city)
    # HTTP request to weather API
    response = HTTP.get("https://api.weather.com/#{city}")
    JSON.parse(response.body)['temperature']
  end
end

This example adds time-based cache expiration to prevent serving stale data. The cache stores both the value and timestamp, invalidating entries older than the configured duration.

Complex Computation Results

Mathematical or analytical computations that take significant time benefit from memoization when called with identical parameters.

class StatisticsAnalyzer
  def initialize(dataset)
    @dataset = dataset
    @cache = {}
  end

  def percentile(p)
    key = [:percentile, p]
    return @cache[key] if @cache.key?(key)
    
    sorted = @dataset.sort
    index = (p / 100.0 * sorted.length).ceil - 1
    @cache[key] = sorted[index]
  end

  def moving_average(window_size)
    key = [:moving_average, window_size]
    return @cache[key] if @cache.key?(key)
    
    @cache[key] = @dataset.each_cons(window_size).map do |window|
      window.sum / window.size.to_f
    end
  end
end

analyzer = StatisticsAnalyzer.new(1..10000.to_a)
analyzer.percentile(95)          # Computes once
analyzer.percentile(95)          # Returns cached
analyzer.moving_average(10)      # Computes once

Performance Considerations

Memoization delivers performance improvements only under specific conditions. The optimization's effectiveness depends on computation cost, cache hit rate, and memory constraints.

Time Complexity Analysis

Memoization converts repeated function calls from O(f(n)) to O(1), where f(n) represents the original function's time complexity. The total cost includes:

First call: O(f(n)) + O(1) cache write
Subsequent calls: O(1) cache lookup

The speedup factor equals the number of repeated calls for identical inputs. Functions called once per unique input gain no benefit and pay the overhead cost of cache management.

Space Complexity Tradeoffs

The cache consumes memory proportional to the number of unique input combinations encountered. For a function with n possible inputs, the worst-case space complexity is O(n).

# Space grows with input domain
class PrimeChecker
  def initialize
    @cache = {}
  end

  def prime?(n)
    return @cache[n] if @cache.key?(n)
    
    @cache[n] = (2...n).none? { |i| n % i == 0 }
  end
end

checker = PrimeChecker.new
(1..1000000).each { |n| checker.prime?(n) }
# Cache now holds 1,000,000 entries

This cache growth can exhaust available memory for large input domains. Production systems need cache eviction policies or size limits.

Cache Hit Rate Impact

The performance benefit correlates directly with cache hit rate. High hit rates (>50%) justify memoization overhead. Low hit rates waste memory without improving performance.

# Measuring cache effectiveness
class MemoizedFunction
  attr_reader :hits, :misses

  def initialize
    @cache = {}
    @hits = 0
    @misses = 0
  end

  def compute(n)
    if @cache.key?(n)
      @hits += 1
      return @cache[n]
    end
    
    @misses += 1
    @cache[n] = expensive_operation(n)
  end

  def hit_rate
    total = @hits + @misses
    return 0.0 if total == 0
    @hits.to_f / total
  end
end

Monitoring hit rates in production reveals whether memoization provides value for specific methods.

Garbage Collection Pressure

Large caches increase memory pressure and trigger more frequent garbage collection cycles. The garbage collector must scan and mark cached objects, adding latency to the application.

Ruby's generational garbage collector promotes long-lived objects to the old generation, reducing scan frequency. Memoization caches typically survive multiple GC cycles and enter the old generation, but the initial promotion creates temporary pressure.

Cache Invalidation Strategies

Long-lived caches need invalidation mechanisms when underlying data changes. Stale data causes bugs when memoized results no longer match current state.

class UserProfile
  def initialize(user_id)
    @user_id = user_id
    @cache = {}
  end

  def display_name
    @cache[:display_name] ||= fetch_display_name
  end

  def clear_cache
    @cache.clear
  end

  def update_profile(attributes)
    database.update(@user_id, attributes)
    clear_cache  # Invalidate after mutation
  end
end

The clear_cache method removes all cached values, forcing fresh computation on next access. More sophisticated approaches use time-based expiration or dependency tracking.

Common Patterns

Lazy Initialization Pattern

Memoization implements lazy initialization, deferring expensive object creation until first use.

class Application
  def database_connection
    @database_connection ||= DatabaseConnection.new(
      host: ENV['DB_HOST'],
      port: ENV['DB_PORT'],
      username: ENV['DB_USER']
    )
  end

  def logger
    @logger ||= Logger.new('application.log').tap do |log|
      log.level = Logger::INFO
      log.formatter = CustomFormatter.new
    end
  end
end

This pattern avoids creating objects that may never be used, reducing startup time and memory consumption.

Result Caching Pattern

Methods that perform the same calculation multiple times benefit from result caching at the method level.

class Report
  def initialize(data)
    @data = data
  end

  def total
    @total ||= @data.sum
  end

  def average
    @average ||= total / @data.size.to_f
  end

  def variance
    @variance ||= begin
      mean = average
      sum_squares = @data.sum { |x| (x - mean) ** 2 }
      sum_squares / @data.size.to_f
    end
  end
end

Each statistical measure computes once and caches the result. The variance calculation reuses the cached average, creating a dependency chain of memoized values.

Recursive Memoization Pattern

Recursive algorithms with overlapping subproblems achieve dramatic speedups through memoization.

class PathFinder
  def initialize(graph)
    @graph = graph
    @cache = {}
  end

  def shortest_path(from, to, visited = Set.new)
    key = [from, to, visited]
    return @cache[key] if @cache.key?(key)
    
    return 0 if from == to
    return Float::INFINITY if visited.include?(from)
    
    visited = visited.dup.add(from)
    
    @cache[key] = @graph.neighbors(from).map do |neighbor|
      1 + shortest_path(neighbor, to, visited)
    end.min || Float::INFINITY
  end
end

The cache key includes the visited set to distinguish between paths. Without memoization, the algorithm recalculates paths repeatedly.

Class Method Memoization Pattern

Singleton data loaded once per application benefits from class-level memoization.

class Configuration
  class << self
    def database_config
      @database_config ||= YAML.load_file('config/database.yml')
    end

    def api_credentials
      @api_credentials ||= {
        key: ENV['API_KEY'],
        secret: ENV['API_SECRET']
      }
    end
  end
end

Class instance variables store results that remain constant for the application lifetime. All instances share the same cached values.

Conditional Memoization Pattern

Some scenarios require conditional caching based on runtime context, such as environment or feature flags.

class DataProcessor
  def process(data)
    if Rails.env.production?
      @cached_result ||= expensive_computation(data)
    else
      expensive_computation(data)  # No caching in development
    end
  end
end

This pattern enables different caching behavior across environments, helping developers see changes immediately while production benefits from caching.

Common Pitfalls

Memoizing Mutable Objects

Caching mutable objects creates aliasing bugs when the cached object gets modified.

class UserList
  def active_users
    @active_users ||= User.where(active: true).to_a
  end
end

list = UserList.new
users = list.active_users
users << fake_user  # Modifies cached array

list.active_users   # Returns contaminated cache including fake_user

The cached array reference allows external code to modify the cache contents. Returning a duplicate or frozen object prevents modification.

def active_users
  @active_users ||= User.where(active: true).to_a.freeze
end

False and Nil Return Values

The ||= operator treats false and nil as absence of a cached value, causing cache misses on valid false/nil results.

class FeatureFlag
  def enabled?(feature)
    @cache ||= {}
    @cache[feature] ||= check_feature_status(feature)  # Bug!
  end

  private

  def check_feature_status(feature)
    # Expensive check that returns true/false
    expensive_api_call(feature)
  end
end

flag = FeatureFlag.new
flag.enabled?(:new_ui)   # Returns false, doesn't cache
flag.enabled?(:new_ui)   # Calls expensive_api_call again

Hash-based caching with explicit key checks solves this problem.

def enabled?(feature)
  @cache ||= {}
  return @cache[feature] if @cache.key?(feature)
  @cache[feature] = check_feature_status(feature)
end

Thread Safety Violations

Concurrent access to memoized methods without synchronization causes race conditions.

class Counter
  def increment
    @count ||= 0
    @count += 1
  end
end

counter = Counter.new
threads = 10.times.map { Thread.new { counter.increment } }
threads.each(&:join)
# Expected: 10, Actual: unpredictable due to race condition

Multiple threads read @count as nil simultaneously, each initializing it to 0 and losing increments. Thread-safe data structures or locks prevent data corruption.

Memory Leaks from Unbounded Caches

Caches that grow without limits eventually consume all available memory.

class Analytics
  def self.track_event(user_id, event)
    @events ||= {}
    @events[user_id] ||= []
    @events[user_id] << event
  end
end

# After millions of events across thousands of users
# @events hash consumes gigabytes of memory

Production systems need cache size limits, time-based expiration, or LRU eviction policies.

class Analytics
  MAX_CACHE_SIZE = 10_000

  def self.track_event(user_id, event)
    @events ||= {}
    @events.shift if @events.size >= MAX_CACHE_SIZE
    @events[user_id] ||= []
    @events[user_id] << event
  end
end

Caching Impure Functions

Memoizing functions with side effects or external dependencies produces incorrect results.

class OrderProcessor
  def process_order(order_id)
    @cache ||= {}
    @cache[order_id] ||= begin
      charge_customer(order_id)      # Side effect!
      send_confirmation_email(order_id)  # Side effect!
      :success
    end
  end
end

processor = OrderProcessor.new
processor.process_order(123)  # Charges customer, sends email
processor.process_order(123)  # Returns cached :success, skips side effects

Memoization skips the function body on cache hits, including side effects. Only pure functions maintain correctness when memoized.

Incorrect Cache Keys for Complex Arguments

Using non-comparable objects or objects with changing state as cache keys causes cache misses for logically equivalent arguments.

class Search
  def find(query_params)
    @cache ||= {}
    @cache[query_params] ||= database.search(query_params)
  end
end

search = Search.new
params1 = { name: 'John', age: 30 }
params2 = { name: 'John', age: 30 }

search.find(params1)  # Computes and caches
search.find(params2)  # Cache miss! Different hash object

Different hash objects with identical contents have different object IDs, causing cache misses. Converting to a canonical form ensures cache hits.

def find(query_params)
  @cache ||= {}
  key = query_params.sort.to_h  # Canonical form
  @cache[key] ||= database.search(query_params)
end

Reference

Memoization Implementation Patterns

Pattern	Use Case	Thread Safe	Handles Nil/False
Instance variable with or-equals	Simple single-value caching	No	No
Hash with key check	Multiple values, complex keys	No	Yes
Class instance variable	Shared class-level data	No	Depends on implementation
Concurrent Map	Multi-threaded access	Yes	Yes
Memoist gem	Declarative method decoration	Configurable	Yes

Cache Key Strategies

Argument Type	Key Strategy	Example
Single primitive	Direct argument	cache[n]
Multiple primitives	Array composite	cache[[x, y]]
Hash arguments	Sorted hash or frozen hash	cache[args.sort.to_h]
Object arguments	Serialize or use ID	cache[obj.id]
Variable arguments	Argument array	cache[args]

Performance Characteristics

Operation	Complexity	Notes
Cache lookup	O(1)	Hash-based cache
Cache insert	O(1)	Hash-based cache
First call	O(f(n))	Original function complexity
Cached call	O(1)	Direct hash lookup
Memory usage	O(u)	u = unique input combinations

Decision Framework

Condition	Use Memoization	Alternative
Function called repeatedly with same args	Yes	None needed
Expensive computation (>10ms)	Yes	Async processing
Pure function	Yes	Cannot memoize impure
Large input domain with low reuse	No	Precomputation or caching layer
Mutable return values	No unless frozen	Return duplicates
Real-time data requirements	No	Time-based expiration
Multi-threaded access	Yes with thread-safe structures	Concurrent Map

Common Methods to Memoize

Method Type	Example	Benefit
Database queries	User.find(id)	Eliminates duplicate queries
API calls	fetch_weather(city)	Reduces network latency
Recursive algorithms	fibonacci(n)	Exponential to linear time
Complex calculations	calculate_tax(amount)	Saves CPU cycles
File I/O	load_configuration	Avoids disk access
Expensive object creation	build_connection	Reduces initialization overhead

Cache Invalidation Approaches

Strategy	Implementation	Use Case
Manual clearing	cache.clear	User-triggered updates
Time-based expiration	Check timestamp on read	Stale data tolerance
Observer pattern	Invalidate on model change	Event-driven systems
Cache key versioning	Include version in key	Gradual migration
Lazy invalidation	Validate on access	Infrequent changes
Cache-aside	Application manages cache	Full control needed

Thread Safety Options

Approach	Overhead	Consistency
Mutex synchronization	High	Strong
Concurrent Map	Medium	Strong
Thread-local storage	Low	Per-thread
Immutable caching	Low	Strong with frozen values
Atomic operations	Low	Strong for simple cases

Memory Management

Technique	Benefit	Implementation
Cache size limit	Bounds memory usage	Remove oldest on overflow
TTL expiration	Automatic cleanup	Store timestamp with value
Weak references	GC can reclaim	WeakRef from standard library
Periodic pruning	Reduce memory pressure	Background task clears old entries
LRU eviction	Remove least used	Track access timestamps

Memoization