Overview
Memoization is an optimization technique that trades memory for speed by caching the results of function calls. When a memoized function executes with a particular set of arguments, it stores the result. Subsequent calls with the same arguments return the cached result instead of recalculating, eliminating redundant computation.
The technique derives its name from the Latin word "memorandum" (to be remembered). Computer scientist Donald Michie coined the term in 1968 to describe a specific form of caching applied to function results.
Memoization applies to pure functions—functions that always return the same output for the same input without side effects. The optimization makes the most impact when functions perform expensive computations, make external API calls, or execute recursive algorithms with overlapping subproblems.
# Without memoization - calculates every time
def fibonacci(n)
return n if n <= 1
fibonacci(n - 1) + fibonacci(n - 2)
end
fibonacci(35) # Takes seconds to complete
# With memoization - caches results
def fibonacci_memo(n, cache = {})
return n if n <= 1
cache[n] ||= fibonacci_memo(n - 1, cache) + fibonacci_memo(n - 2, cache)
end
fibonacci_memo(35) # Returns instantly
The performance difference between these implementations becomes dramatic as input size grows. The unmemoized version makes millions of redundant calculations, while the memoized version computes each unique value once.
Key Principles
Memoization operates on three fundamental principles: caching, lookup, and return. When a function executes, it first checks whether a cached result exists for the current arguments. If a cache hit occurs, the function returns the stored value immediately. On a cache miss, the function performs the computation, stores the result in the cache, and returns the value.
The caching mechanism requires a data structure that maps inputs to outputs. Hash tables provide optimal lookup performance with O(1) average-case complexity, making them the standard choice for memoization caches. The input arguments serve as keys, and the computed results serve as values.
Cache Key Generation
Creating effective cache keys requires converting function arguments into a unique, hashable identifier. Simple cases use single arguments directly as keys. Complex cases combine multiple arguments into composite keys.
# Single argument - direct key
cache[n]
# Multiple arguments - composite key
cache[[x, y, z]]
# Named arguments - convert to sorted array or hash
cache[options.sort.to_h]
Function Purity Requirement
Memoization maintains correctness only for pure functions. A pure function satisfies two conditions: it produces identical output for identical input (referential transparency), and it causes no observable side effects. Functions that modify global state, perform I/O operations, or depend on external mutable state violate purity and produce incorrect results when memoized.
# Pure function - safe to memoize
def calculate_tax(amount, rate)
amount * rate
end
# Impure function - unsafe to memoize
def get_current_price(product_id)
api_client.fetch_price(product_id) # External state changes
end
Space-Time Tradeoff
Memoization exemplifies the classic space-time tradeoff in computer science. The technique reduces execution time by consuming additional memory to store cached results. The tradeoff becomes favorable when computation cost exceeds storage cost and when the same inputs recur frequently.
The cache size grows proportionally to the number of unique input combinations encountered. Functions with large input domains or infrequent argument repetition may consume excessive memory without delivering performance benefits.
Ruby Implementation
Ruby provides multiple approaches to implementing memoization, from simple instance variable caching to sophisticated library-based solutions. The language's syntax and conventions make certain patterns particularly idiomatic.
The ||= Operator Pattern
The most common Ruby memoization pattern uses the conditional assignment operator (||=). This operator assigns a value to a variable only if the variable is nil or false.
class Calculator
def expensive_operation
@result ||= begin
# Complex calculation
sleep(2)
42
end
end
end
calc = Calculator.new
calc.expensive_operation # Takes 2 seconds
calc.expensive_operation # Returns immediately
The pattern works by storing the result in an instance variable. On first execution, the instance variable is nil, so the calculation runs and assigns the result. Subsequent calls find the instance variable already set and return its value without re-executing the calculation block.
This pattern has a critical limitation: it cannot distinguish between a cached nil/false value and an uncached value. If the expensive operation legitimately returns nil or false, the memoization breaks.
# Broken memoization for nil/false results
def find_user(id)
@user ||= database.find(id) # Breaks if user doesn't exist (returns nil)
end
# First call: user not found, returns nil, doesn't cache
# Second call: @user still nil, queries database again
Hash-Based Memoization
Hash-based caching solves the nil/false problem by using explicit key presence checks. The pattern uses a hash to store results and checks whether the hash contains a key before computing.
class Calculator
def initialize
@cache = {}
end
def factorial(n)
return @cache[n] if @cache.key?(n)
@cache[n] = if n <= 1
1
else
n * factorial(n - 1)
end
end
end
This approach correctly handles any return value, including nil and false. The key presence check distinguishes between "no cached value" and "cached nil value."
Multi-Argument Memoization
Methods accepting multiple arguments require composite cache keys. Ruby's array comparison makes array-based keys straightforward.
class GeometryCalculator
def initialize
@cache = {}
end
def distance(x1, y1, x2, y2)
key = [x1, y1, x2, y2]
return @cache[key] if @cache.key?(key)
@cache[key] = Math.sqrt((x2 - x1)**2 + (y2 - y1)**2)
end
end
Class-Level Memoization
Class methods and singleton methods require class instance variables to maintain separate caches.
class Configuration
def self.settings
@settings ||= begin
# Load from file or API
YAML.load_file('config.yml')
end
end
end
Class instance variables (@settings in this case) belong to the class object itself, not instances of the class. Each class maintains its own set of class instance variables.
Thread-Safe Memoization
The basic ||= pattern is not thread-safe. Multiple threads executing simultaneously can trigger multiple calculations before any thread completes the caching.
require 'concurrent'
class ThreadSafeCalculator
def initialize
@cache = Concurrent::Map.new
end
def expensive_calculation(n)
@cache.fetch_or_store(n) do
sleep(1) # Simulate expensive work
n * 2
end
end
end
The concurrent-ruby gem provides thread-safe data structures. The Concurrent::Map#fetch_or_store method atomically checks for a key and computes the value if absent, preventing race conditions.
Memoist Gem
The memoist gem provides a declarative syntax for memoization through method decoration.
require 'memoist'
class Calculator
extend Memoist
def fibonacci(n)
return n if n <= 1
fibonacci(n - 1) + fibonacci(n - 2)
end
memoize :fibonacci
end
The memoize declaration wraps the method with caching logic, separating the business logic from the optimization concern. The gem handles edge cases, thread safety options, and cache invalidation.
Practical Examples
Recursive Algorithm Optimization
The Fibonacci sequence calculation demonstrates memoization's impact on recursive algorithms with overlapping subproblems. The naive recursive implementation exhibits exponential time complexity due to redundant calculations.
class FibonacciCalculator
def initialize
@cache = {}
end
def calculate(n)
return @cache[n] if @cache.key?(n)
@cache[n] = if n <= 1
n
else
calculate(n - 1) + calculate(n - 2)
end
end
def cache_size
@cache.size
end
end
fib = FibonacciCalculator.new
fib.calculate(40) # Returns 102334155
fib.cache_size # Returns 41 (cached 0 through 40)
The memoized version reduces time complexity from O(2^n) to O(n) by computing each number once. For n=40, the unmemoized version makes over 200 million calls, while the memoized version makes only 41.
Database Query Caching
Applications frequently make repeated database queries for the same data within a request lifecycle. Memoization eliminates redundant queries.
class User
def self.find_with_cache(id)
@user_cache ||= {}
return @user_cache[id] if @user_cache.key?(id)
@user_cache[id] = find(id)
end
def roles
@roles ||= database.query("SELECT * FROM roles WHERE user_id = ?", id)
end
def permissions
@permissions ||= roles.flat_map(&:permissions).uniq
end
end
The pattern prevents the N+1 query problem when accessing the same user multiple times. The cache persists for the class lifetime, but production systems typically need request-scoped caching to prevent stale data.
API Response Caching
External API calls introduce latency and may have rate limits. Memoization reduces API calls for duplicate requests.
class WeatherService
def initialize
@cache = {}
@cache_duration = 300 # 5 minutes
end
def current_temperature(city)
cached = @cache[city]
if cached && Time.now - cached[:timestamp] < @cache_duration
return cached[:value]
end
temp = fetch_from_api(city)
@cache[city] = { value: temp, timestamp: Time.now }
temp
end
private
def fetch_from_api(city)
# HTTP request to weather API
response = HTTP.get("https://api.weather.com/#{city}")
JSON.parse(response.body)['temperature']
end
end
This example adds time-based cache expiration to prevent serving stale data. The cache stores both the value and timestamp, invalidating entries older than the configured duration.
Complex Computation Results
Mathematical or analytical computations that take significant time benefit from memoization when called with identical parameters.
class StatisticsAnalyzer
def initialize(dataset)
@dataset = dataset
@cache = {}
end
def percentile(p)
key = [:percentile, p]
return @cache[key] if @cache.key?(key)
sorted = @dataset.sort
index = (p / 100.0 * sorted.length).ceil - 1
@cache[key] = sorted[index]
end
def moving_average(window_size)
key = [:moving_average, window_size]
return @cache[key] if @cache.key?(key)
@cache[key] = @dataset.each_cons(window_size).map do |window|
window.sum / window.size.to_f
end
end
end
analyzer = StatisticsAnalyzer.new(1..10000.to_a)
analyzer.percentile(95) # Computes once
analyzer.percentile(95) # Returns cached
analyzer.moving_average(10) # Computes once
Performance Considerations
Memoization delivers performance improvements only under specific conditions. The optimization's effectiveness depends on computation cost, cache hit rate, and memory constraints.
Time Complexity Analysis
Memoization converts repeated function calls from O(f(n)) to O(1), where f(n) represents the original function's time complexity. The total cost includes:
- First call: O(f(n)) + O(1) cache write
- Subsequent calls: O(1) cache lookup
The speedup factor equals the number of repeated calls for identical inputs. Functions called once per unique input gain no benefit and pay the overhead cost of cache management.
Space Complexity Tradeoffs
The cache consumes memory proportional to the number of unique input combinations encountered. For a function with n possible inputs, the worst-case space complexity is O(n).
# Space grows with input domain
class PrimeChecker
def initialize
@cache = {}
end
def prime?(n)
return @cache[n] if @cache.key?(n)
@cache[n] = (2...n).none? { |i| n % i == 0 }
end
end
checker = PrimeChecker.new
(1..1000000).each { |n| checker.prime?(n) }
# Cache now holds 1,000,000 entries
This cache growth can exhaust available memory for large input domains. Production systems need cache eviction policies or size limits.
Cache Hit Rate Impact
The performance benefit correlates directly with cache hit rate. High hit rates (>50%) justify memoization overhead. Low hit rates waste memory without improving performance.
# Measuring cache effectiveness
class MemoizedFunction
attr_reader :hits, :misses
def initialize
@cache = {}
@hits = 0
@misses = 0
end
def compute(n)
if @cache.key?(n)
@hits += 1
return @cache[n]
end
@misses += 1
@cache[n] = expensive_operation(n)
end
def hit_rate
total = @hits + @misses
return 0.0 if total == 0
@hits.to_f / total
end
end
Monitoring hit rates in production reveals whether memoization provides value for specific methods.
Garbage Collection Pressure
Large caches increase memory pressure and trigger more frequent garbage collection cycles. The garbage collector must scan and mark cached objects, adding latency to the application.
Ruby's generational garbage collector promotes long-lived objects to the old generation, reducing scan frequency. Memoization caches typically survive multiple GC cycles and enter the old generation, but the initial promotion creates temporary pressure.
Cache Invalidation Strategies
Long-lived caches need invalidation mechanisms when underlying data changes. Stale data causes bugs when memoized results no longer match current state.
class UserProfile
def initialize(user_id)
@user_id = user_id
@cache = {}
end
def display_name
@cache[:display_name] ||= fetch_display_name
end
def clear_cache
@cache.clear
end
def update_profile(attributes)
database.update(@user_id, attributes)
clear_cache # Invalidate after mutation
end
end
The clear_cache method removes all cached values, forcing fresh computation on next access. More sophisticated approaches use time-based expiration or dependency tracking.
Common Patterns
Lazy Initialization Pattern
Memoization implements lazy initialization, deferring expensive object creation until first use.
class Application
def database_connection
@database_connection ||= DatabaseConnection.new(
host: ENV['DB_HOST'],
port: ENV['DB_PORT'],
username: ENV['DB_USER']
)
end
def logger
@logger ||= Logger.new('application.log').tap do |log|
log.level = Logger::INFO
log.formatter = CustomFormatter.new
end
end
end
This pattern avoids creating objects that may never be used, reducing startup time and memory consumption.
Result Caching Pattern
Methods that perform the same calculation multiple times benefit from result caching at the method level.
class Report
def initialize(data)
@data = data
end
def total
@total ||= @data.sum
end
def average
@average ||= total / @data.size.to_f
end
def variance
@variance ||= begin
mean = average
sum_squares = @data.sum { |x| (x - mean) ** 2 }
sum_squares / @data.size.to_f
end
end
end
Each statistical measure computes once and caches the result. The variance calculation reuses the cached average, creating a dependency chain of memoized values.
Recursive Memoization Pattern
Recursive algorithms with overlapping subproblems achieve dramatic speedups through memoization.
class PathFinder
def initialize(graph)
@graph = graph
@cache = {}
end
def shortest_path(from, to, visited = Set.new)
key = [from, to, visited]
return @cache[key] if @cache.key?(key)
return 0 if from == to
return Float::INFINITY if visited.include?(from)
visited = visited.dup.add(from)
@cache[key] = @graph.neighbors(from).map do |neighbor|
1 + shortest_path(neighbor, to, visited)
end.min || Float::INFINITY
end
end
The cache key includes the visited set to distinguish between paths. Without memoization, the algorithm recalculates paths repeatedly.
Class Method Memoization Pattern
Singleton data loaded once per application benefits from class-level memoization.
class Configuration
class << self
def database_config
@database_config ||= YAML.load_file('config/database.yml')
end
def api_credentials
@api_credentials ||= {
key: ENV['API_KEY'],
secret: ENV['API_SECRET']
}
end
end
end
Class instance variables store results that remain constant for the application lifetime. All instances share the same cached values.
Conditional Memoization Pattern
Some scenarios require conditional caching based on runtime context, such as environment or feature flags.
class DataProcessor
def process(data)
if Rails.env.production?
@cached_result ||= expensive_computation(data)
else
expensive_computation(data) # No caching in development
end
end
end
This pattern enables different caching behavior across environments, helping developers see changes immediately while production benefits from caching.
Common Pitfalls
Memoizing Mutable Objects
Caching mutable objects creates aliasing bugs when the cached object gets modified.
class UserList
def active_users
@active_users ||= User.where(active: true).to_a
end
end
list = UserList.new
users = list.active_users
users << fake_user # Modifies cached array
list.active_users # Returns contaminated cache including fake_user
The cached array reference allows external code to modify the cache contents. Returning a duplicate or frozen object prevents modification.
def active_users
@active_users ||= User.where(active: true).to_a.freeze
end
False and Nil Return Values
The ||= operator treats false and nil as absence of a cached value, causing cache misses on valid false/nil results.
class FeatureFlag
def enabled?(feature)
@cache ||= {}
@cache[feature] ||= check_feature_status(feature) # Bug!
end
private
def check_feature_status(feature)
# Expensive check that returns true/false
expensive_api_call(feature)
end
end
flag = FeatureFlag.new
flag.enabled?(:new_ui) # Returns false, doesn't cache
flag.enabled?(:new_ui) # Calls expensive_api_call again
Hash-based caching with explicit key checks solves this problem.
def enabled?(feature)
@cache ||= {}
return @cache[feature] if @cache.key?(feature)
@cache[feature] = check_feature_status(feature)
end
Thread Safety Violations
Concurrent access to memoized methods without synchronization causes race conditions.
class Counter
def increment
@count ||= 0
@count += 1
end
end
counter = Counter.new
threads = 10.times.map { Thread.new { counter.increment } }
threads.each(&:join)
# Expected: 10, Actual: unpredictable due to race condition
Multiple threads read @count as nil simultaneously, each initializing it to 0 and losing increments. Thread-safe data structures or locks prevent data corruption.
Memory Leaks from Unbounded Caches
Caches that grow without limits eventually consume all available memory.
class Analytics
def self.track_event(user_id, event)
@events ||= {}
@events[user_id] ||= []
@events[user_id] << event
end
end
# After millions of events across thousands of users
# @events hash consumes gigabytes of memory
Production systems need cache size limits, time-based expiration, or LRU eviction policies.
class Analytics
MAX_CACHE_SIZE = 10_000
def self.track_event(user_id, event)
@events ||= {}
@events.shift if @events.size >= MAX_CACHE_SIZE
@events[user_id] ||= []
@events[user_id] << event
end
end
Caching Impure Functions
Memoizing functions with side effects or external dependencies produces incorrect results.
class OrderProcessor
def process_order(order_id)
@cache ||= {}
@cache[order_id] ||= begin
charge_customer(order_id) # Side effect!
send_confirmation_email(order_id) # Side effect!
:success
end
end
end
processor = OrderProcessor.new
processor.process_order(123) # Charges customer, sends email
processor.process_order(123) # Returns cached :success, skips side effects
Memoization skips the function body on cache hits, including side effects. Only pure functions maintain correctness when memoized.
Incorrect Cache Keys for Complex Arguments
Using non-comparable objects or objects with changing state as cache keys causes cache misses for logically equivalent arguments.
class Search
def find(query_params)
@cache ||= {}
@cache[query_params] ||= database.search(query_params)
end
end
search = Search.new
params1 = { name: 'John', age: 30 }
params2 = { name: 'John', age: 30 }
search.find(params1) # Computes and caches
search.find(params2) # Cache miss! Different hash object
Different hash objects with identical contents have different object IDs, causing cache misses. Converting to a canonical form ensures cache hits.
def find(query_params)
@cache ||= {}
key = query_params.sort.to_h # Canonical form
@cache[key] ||= database.search(query_params)
end
Reference
Memoization Implementation Patterns
| Pattern | Use Case | Thread Safe | Handles Nil/False |
|---|---|---|---|
| Instance variable with or-equals | Simple single-value caching | No | No |
| Hash with key check | Multiple values, complex keys | No | Yes |
| Class instance variable | Shared class-level data | No | Depends on implementation |
| Concurrent Map | Multi-threaded access | Yes | Yes |
| Memoist gem | Declarative method decoration | Configurable | Yes |
Cache Key Strategies
| Argument Type | Key Strategy | Example |
|---|---|---|
| Single primitive | Direct argument | cache[n] |
| Multiple primitives | Array composite | cache[[x, y]] |
| Hash arguments | Sorted hash or frozen hash | cache[args.sort.to_h] |
| Object arguments | Serialize or use ID | cache[obj.id] |
| Variable arguments | Argument array | cache[args] |
Performance Characteristics
| Operation | Complexity | Notes |
|---|---|---|
| Cache lookup | O(1) | Hash-based cache |
| Cache insert | O(1) | Hash-based cache |
| First call | O(f(n)) | Original function complexity |
| Cached call | O(1) | Direct hash lookup |
| Memory usage | O(u) | u = unique input combinations |
Decision Framework
| Condition | Use Memoization | Alternative |
|---|---|---|
| Function called repeatedly with same args | Yes | None needed |
| Expensive computation (>10ms) | Yes | Async processing |
| Pure function | Yes | Cannot memoize impure |
| Large input domain with low reuse | No | Precomputation or caching layer |
| Mutable return values | No unless frozen | Return duplicates |
| Real-time data requirements | No | Time-based expiration |
| Multi-threaded access | Yes with thread-safe structures | Concurrent Map |
Common Methods to Memoize
| Method Type | Example | Benefit |
|---|---|---|
| Database queries | User.find(id) | Eliminates duplicate queries |
| API calls | fetch_weather(city) | Reduces network latency |
| Recursive algorithms | fibonacci(n) | Exponential to linear time |
| Complex calculations | calculate_tax(amount) | Saves CPU cycles |
| File I/O | load_configuration | Avoids disk access |
| Expensive object creation | build_connection | Reduces initialization overhead |
Cache Invalidation Approaches
| Strategy | Implementation | Use Case |
|---|---|---|
| Manual clearing | cache.clear | User-triggered updates |
| Time-based expiration | Check timestamp on read | Stale data tolerance |
| Observer pattern | Invalidate on model change | Event-driven systems |
| Cache key versioning | Include version in key | Gradual migration |
| Lazy invalidation | Validate on access | Infrequent changes |
| Cache-aside | Application manages cache | Full control needed |
Thread Safety Options
| Approach | Overhead | Consistency |
|---|---|---|
| Mutex synchronization | High | Strong |
| Concurrent Map | Medium | Strong |
| Thread-local storage | Low | Per-thread |
| Immutable caching | Low | Strong with frozen values |
| Atomic operations | Low | Strong for simple cases |
Memory Management
| Technique | Benefit | Implementation |
|---|---|---|
| Cache size limit | Bounds memory usage | Remove oldest on overflow |
| TTL expiration | Automatic cleanup | Store timestamp with value |
| Weak references | GC can reclaim | WeakRef from standard library |
| Periodic pruning | Reduce memory pressure | Background task clears old entries |
| LRU eviction | Remove least used | Track access timestamps |