CrackedRuby logo

CrackedRuby

Lazy Evaluation

A comprehensive guide to lazy evaluation in Ruby using Enumerator::Lazy for memory-efficient data processing and infinite sequences.

Performance Optimization Optimization Techniques
7.4.3

Overview

Lazy evaluation defers computation until results are actually needed, contrasting with Ruby's default eager evaluation where operations execute immediately. Ruby implements lazy evaluation through the Enumerator::Lazy class, which wraps enumerable objects and chains operations without processing data until a terminal operation forces evaluation.

The primary mechanism involves converting any enumerable to a lazy enumerator using the lazy method, then chaining operations like map, select, and reject. These operations return new lazy enumerators rather than processing data immediately. Only when a terminal method like to_a, first, or each executes does Ruby perform the accumulated transformations.

# Eager evaluation - processes immediately
[1, 2, 3, 4, 5].map { |x| x * 2 }.select { |x| x > 4 }
# => [6, 8, 10]

# Lazy evaluation - defers processing
[1, 2, 3, 4, 5].lazy.map { |x| x * 2 }.select { |x| x > 4 }
# => #<Enumerator::Lazy: ...>

Lazy evaluation excels with large datasets, infinite sequences, and scenarios where early termination saves processing time. Ruby creates a pipeline of transformations that processes elements one at a time through the entire chain, rather than creating intermediate arrays between each operation.

The Enumerator::Lazy class supports most enumerable methods including map, select, reject, take, drop, flat_map, and zip. Each method returns another lazy enumerator, maintaining the deferred execution pattern until a terminal operation triggers evaluation.

Basic Usage

Converting any enumerable to lazy evaluation requires calling the lazy method, which returns an Enumerator::Lazy object. This lazy enumerator supports method chaining while deferring actual computation.

# Creating a lazy enumerator
numbers = (1..1000).lazy
# => #<Enumerator::Lazy: 1..1000>

# Chaining transformations
result = numbers.map { |x| x * 2 }
                .select { |x| x.even? }
                .take(5)
# => #<Enumerator::Lazy: ...>

# Force evaluation with terminal operation
result.to_a
# => [2, 4, 6, 8, 10]

The take method proves particularly valuable with lazy evaluation, allowing extraction of specific numbers of results without processing the entire collection. This pattern works effectively with infinite ranges or large datasets.

# Processing only needed elements
fibonacci = Enumerator.new do |yielder|
  a, b = 1, 1
  loop do
    yielder << a
    a, b = b, a + b
  end
end

fibonacci.lazy
         .select { |n| n.even? }
         .take(5)
         .to_a
# => [2, 8, 34, 144, 610]

Multiple transformation methods chain together seamlessly, with each operation adding to the pipeline without executing immediately. The lazy enumerator maintains the sequence of operations internally and applies them during evaluation.

# Complex transformation chain
data = %w[apple banana cherry date elderberry]
result = data.lazy
             .map(&:upcase)
             .select { |word| word.length > 4 }
             .map { |word| word.reverse }
             .take(3)
             .to_a
# => ["ELPPA", "ANANAB", "YRREHC"]

Lazy enumerators work with any enumerable object, including arrays, hashes, ranges, and custom enumerators. The conversion preserves the original enumerable's characteristics while adding deferred execution capabilities.

Advanced Usage

Complex lazy evaluation patterns emerge when combining multiple enumerators, creating custom lazy operations, and integrating with Ruby's broader enumerable ecosystem. The flat_map method flattens nested structures lazily, proving essential for processing hierarchical data without memory overhead.

# Lazy processing of nested data
nested_data = [
  [1, 2, 3],
  [4, 5, 6],
  [7, 8, 9]
]

result = nested_data.lazy
                   .flat_map(&:itself)
                   .map { |x| x ** 2 }
                   .select { |x| x > 10 }
                   .take(4)
                   .to_a
# => [16, 25, 36, 49]

The zip method combines multiple lazy enumerators, creating tuples from corresponding elements. This pattern works effectively for parallel processing of multiple data streams without loading everything into memory.

# Combining multiple lazy enumerators
names = %w[alice bob charlie dave eve].lazy
ages = (20..30).lazy
scores = [85, 92, 78, 96, 88].lazy

combined = names.zip(ages, scores)
                .map { |name, age, score| { name: name, age: age, score: score } }
                .select { |person| person[:score] > 80 }
                .take(3)
                .to_a
# => [{:name=>"alice", :age=>20, :score=>85}, 
#     {:name=>"bob", :age=>21, :score=>92}, 
#     {:name=>"dave", :age=>23, :score=>96}]

Custom lazy operations integrate through the Enumerator::Lazy.new constructor, accepting the source enumerator and a block that defines the transformation logic. The block receives a yielder object that produces values for the next stage in the pipeline.

# Custom lazy operation
module LazyExtensions
  def lazy_sample(n)
    Enumerator::Lazy.new(self) do |yielder, value|
      yielder << value if rand < (n.to_f / size)
    end
  end
end

class Array
  include LazyExtensions
end

# Using custom lazy operation
large_dataset = (1..10000).to_a
sample = large_dataset.lazy
                      .lazy_sample(0.1)
                      .map { |x| x * 2 }
                      .select { |x| x > 1000 }
                      .take(10)
                      .to_a

Lazy enumerators compose with regular enumerators and other Ruby objects implementing enumerable protocols. The cycle method creates infinite repeating sequences that work naturally with lazy evaluation patterns.

# Infinite cycling with lazy evaluation
colors = %w[red green blue].cycle.lazy
numbers = (1..Float::INFINITY).lazy

colored_numbers = numbers.zip(colors)
                         .map { |num, color| "#{color}_#{num}" }
                         .select { |item| item.include?('red') }
                         .take(5)
                         .to_a
# => ["red_1", "red_4", "red_7", "red_10", "red_13"]

Performance & Memory

Lazy evaluation provides significant memory advantages when processing large datasets by avoiding intermediate array creation. Each operation in a lazy chain processes elements individually rather than creating full intermediate collections, reducing memory footprint substantially.

# Memory comparison demonstration
require 'benchmark'

large_range = (1..1_000_000)

# Eager evaluation creates intermediate arrays
eager_time = Benchmark.realtime do
  large_range.map { |x| x * 2 }
             .select { |x| x > 1_000_000 }
             .first(10)
end

# Lazy evaluation processes elements individually  
lazy_time = Benchmark.realtime do
  large_range.lazy
             .map { |x| x * 2 }
             .select { |x| x > 1_000_000 }
             .first(10)
end

puts "Eager: #{eager_time}s, Lazy: #{lazy_time}s"
# Lazy evaluation typically shows 2-10x performance improvement

The memory efficiency becomes particularly apparent with file processing, where lazy evaluation enables handling files larger than available RAM. Processing occurs line-by-line without loading entire files into memory.

# Memory-efficient file processing
def process_large_log(filename)
  File.foreach(filename).lazy
      .map(&:strip)
      .reject(&:empty?)
      .select { |line| line.include?('ERROR') }
      .map { |line| parse_log_entry(line) }
      .select { |entry| entry[:timestamp] > Time.now - 86400 }
      .take(100)
      .to_a
end

# Processes gigabyte files with constant memory usage
error_entries = process_large_log('/var/log/application.log')

Early termination scenarios demonstrate lazy evaluation's computational efficiency. When operations like first, take, or detect can satisfy requirements without processing entire collections, lazy evaluation stops computation immediately.

# Early termination efficiency
def find_prime_lazy(limit)
  (2..Float::INFINITY).lazy
                      .select { |n| prime?(n) }
                      .take(limit)
                      .to_a
end

def find_prime_eager(limit)
  (2..1_000_000).select { |n| prime?(n) }
                .first(limit)
end

# Lazy version stops at the 100th prime
# Eager version checks all numbers up to 1,000,000
primes = find_prime_lazy(100)

However, lazy evaluation introduces computational overhead for each element processed through the pipeline. When processing small collections entirely, eager evaluation often performs better due to reduced method call overhead and simpler execution paths.

# Performance crossover point
small_data = (1..100).to_a
large_data = (1..100_000).to_a

# Small data - eager faster
small_eager = small_data.map(&:to_s).select(&:even?).first(10)
small_lazy = small_data.lazy.map(&:to_s).select(&:even?).first(10)

# Large data - lazy faster  
large_eager = large_data.map(&:to_s).select(&:even?).first(10)
large_lazy = large_data.lazy.map(&:to_s).select(&:even?).first(10)

Lazy evaluation performs optimally with expensive operations, sparse filtering conditions, or scenarios requiring only partial results from large datasets. The performance benefits compound with pipeline depth and data size.

Common Pitfalls

Lazy evaluation defers execution until terminal operations force evaluation, creating debugging challenges when operations don't behave as expected. The lazy enumerator itself doesn't execute transformations, making intermediate inspection difficult without triggering evaluation.

# Debugging lazy operations
data = [1, 2, 3, 4, 5]
lazy_result = data.lazy.map { |x| puts "Processing #{x}"; x * 2 }

# No output yet - transformations not executed
puts lazy_result.class
# => Enumerator::Lazy

# Output appears only during evaluation
lazy_result.to_a
# Processing 1
# Processing 2
# Processing 3
# Processing 4
# Processing 5
# => [2, 4, 6, 8, 10]

Side effects within lazy operations execute during evaluation rather than chain construction, potentially causing confusion about when effects occur. This behavior differs significantly from eager evaluation where side effects happen immediately.

# Side effect timing confusion
counter = 0

lazy_chain = (1..5).lazy.map do |x|
  counter += 1
  puts "Counter: #{counter}"
  x * 2
end

puts "Chain created, counter: #{counter}"
# => Chain created, counter: 0

# Side effects execute during evaluation
first_three = lazy_chain.take(3).to_a
# Counter: 1
# Counter: 2  
# Counter: 3
puts "After taking 3, counter: #{counter}"
# => After taking 3, counter: 3

Multiple enumerations of the same lazy enumerator re-execute all transformations, unlike arrays which cache results. This behavior can cause unexpected performance implications and side effect repetition.

# Multiple enumeration re-execution
expensive_operation = proc { |x| sleep(0.1); x ** 2 }
lazy_squares = (1..5).lazy.map(&expensive_operation)

# First enumeration takes ~0.5 seconds
first_pass = lazy_squares.to_a
# => [1, 4, 9, 16, 25]

# Second enumeration re-executes, another ~0.5 seconds
second_pass = lazy_squares.to_a  
# => [1, 4, 9, 16, 25]

# Convert to array for caching if multiple enumerations needed
cached_squares = lazy_squares.to_a
third_pass = cached_squares  # Instant

Infinite sequences combined with certain operations can create infinite loops if not properly terminated. Methods like to_a or each without limiting operations will attempt to process infinite sequences completely.

# Infinite sequence pitfall
infinite_numbers = (1..Float::INFINITY).lazy

# This will run forever
# infinite_numbers.to_a  # DON'T DO THIS

# Always limit infinite sequences
safe_result = infinite_numbers.select(&:even?)
                              .take(10)
                              .to_a
# => [2, 4, 6, 8, 10, 12, 14, 16, 18, 20]

Lazy evaluation doesn't automatically parallelize operations despite the deferred execution model. Each element still processes sequentially through the transformation pipeline, and thread safety concerns apply when using shared state within transformation blocks.

# Sequential processing misconception
large_data = (1..1_000_000).lazy

# Still processes sequentially, not parallel
result = large_data.map { |x| heavy_computation(x) }
                   .select { |x| x > threshold }
                   .take(100)
                   .to_a

# For parallel processing, use other gems like Parallel
# require 'parallel'
# result = Parallel.map(large_data.first(10000)) { |x| heavy_computation(x) }

Reference

Core Methods

Method Parameters Returns Description
lazy None Enumerator::Lazy Converts enumerable to lazy enumerator
map &block Enumerator::Lazy Transforms each element lazily
select &block Enumerator::Lazy Filters elements matching condition
reject &block Enumerator::Lazy Filters elements not matching condition
take n (Integer) Enumerator::Lazy Takes first n elements
drop n (Integer) Enumerator::Lazy Skips first n elements
flat_map &block Enumerator::Lazy Maps and flattens results
zip *others Enumerator::Lazy Combines with other enumerators

Transformation Methods

Method Parameters Returns Description
collect &block Enumerator::Lazy Alias for map
find_all &block Enumerator::Lazy Alias for select
grep pattern Enumerator::Lazy Selects elements matching pattern
grep_v pattern Enumerator::Lazy Selects elements not matching pattern
uniq &block (optional) Enumerator::Lazy Removes duplicate elements
slice_before pattern or &block Enumerator::Lazy Groups elements at boundaries
slice_after pattern or &block Enumerator::Lazy Groups elements after pattern
slice_when &block Enumerator::Lazy Groups elements when condition changes

Terminal Operations

Method Parameters Returns Description
to_a None Array Forces evaluation, returns array
force None Array Alias for to_a
first n (Integer, optional) Object or Array Returns first element(s)
take n (Integer) Array Returns first n elements as array
each &block Object Iterates through elements
reduce initial, &block Object Reduces elements to single value
inject initial, &block Object Alias for reduce

Utility Methods

Method Parameters Returns Description
lazy? None Boolean Returns true (always for Lazy objects)
size None Integer or nil Returns size if determinable
count &block (optional) Integer Counts elements matching condition
include? object Boolean Checks if object is present
member? object Boolean Alias for include?

Construction Patterns

# From enumerable
array.lazy                    # Convert array to lazy
range.lazy                    # Convert range to lazy  
hash.lazy                     # Convert hash to lazy

# From enumerator
enum.lazy                     # Convert enumerator to lazy
Enumerator.new { }.lazy       # Custom enumerator to lazy

# Custom lazy enumerator
Enumerator::Lazy.new(source) do |yielder, *values|
  # Custom transformation logic
  yielder << transformed_value
end

Performance Characteristics

Scenario Memory Usage CPU Usage Recommendation
Small collections (< 1000) Higher overhead Higher overhead Use eager evaluation
Large collections (> 10000) Constant Optimized Use lazy evaluation
Infinite sequences Constant On-demand Always use lazy
Early termination Minimal Minimal Use lazy with take/first
Multiple iterations Per iteration Per iteration Cache with to_a if needed
File processing Constant Streaming Use lazy evaluation