CrackedRuby - Lambda Architecture

Overview

Lambda Architecture addresses the challenge of building distributed data processing systems that provide both accurate batch computations and real-time query responses. The architecture separates data processing into distinct layers optimized for their specific characteristics: a batch layer for complete and accurate results, a speed layer for low-latency approximate results, and a serving layer that merges outputs from both.

The architecture emerged from the need to balance consistency and availability in distributed systems while maintaining query performance. Traditional batch processing systems achieve high accuracy but suffer from processing delays measured in hours or days. Pure streaming systems provide low latency but struggle with correctness guarantees and fault tolerance. Lambda Architecture reconciles these competing requirements through a dual-pipeline approach.

The three-layer structure processes all data through both batch and streaming paths. The batch layer continuously recomputes comprehensive views from the complete dataset, providing eventual consistency and error correction. The speed layer processes only recent data to fill the temporal gap between batch processing runs. The serving layer indexes batch views for fast random access while merging them with real-time updates from the speed layer.

This separation enables systems to handle diverse query patterns across different timescales. Analytics queries requiring historical accuracy access the batch layer. Monitoring dashboards needing current values query the speed layer. User-facing applications combine both layers to provide complete and current results.

[Raw Data] → Batch Layer → Batch Views ↘
                                         Serving Layer → Query Results
[Raw Data] → Speed Layer → Real-time Views ↗

The architecture assumes immutable data, append-only storage, and the ability to recompute results from raw inputs. These assumptions enable fault tolerance through recomputation rather than complex coordination protocols. When errors occur in either layer, the batch layer eventually corrects them during its next complete processing cycle.

Key Principles

Lambda Architecture operates on several foundational principles that define its behavior and characteristics.

Immutability of data forms the core assumption. All incoming data appends to an immutable master dataset without updates or deletions. This immutability simplifies reasoning about system state and enables recomputation as the primary fault tolerance mechanism. When processing errors occur, the system reprocesses the original immutable data rather than attempting to fix corrupted state.

Separate batch and speed layers handle different temporal characteristics of data processing. The batch layer processes the entire dataset periodically, computing comprehensive views with complete accuracy. Processing runs take hours but guarantee correctness. The speed layer processes only data that arrived since the last batch processing run, computing incremental views with low latency measured in seconds. Results may be approximate but provide current information.

Recomputation rather than incremental updates handles the batch layer. Each batch processing run starts from the complete raw dataset and recomputes all views from scratch. This approach eliminates the complexity of incremental updates and the possibility of accumulating errors. While computationally expensive, modern distributed computing frameworks make full recomputation feasible for many workloads.

Human fault tolerance refers to the ability to correct logic errors in processing code. When developers discover bugs in batch processing logic, they fix the code and rerun batch processing. The batch layer recomputes all views with corrected logic, eventually replacing incorrect results. This differs from traditional database systems where logic errors permanently corrupt data.

Query merging combines results from batch and speed layers at read time. The serving layer merges batch views (complete but slightly stale) with speed layer views (current but incremental) to produce final query results. This merge operation must handle overlapping time ranges and potential inconsistencies between layers.

Eventually consistent writes characterize the system's consistency model. The speed layer may provide approximate or incomplete results for recent data. The batch layer eventually processes all data and produces correct comprehensive views. Applications tolerate temporary inconsistencies in exchange for low latency.

The batch layer implementation typically uses distributed computing frameworks designed for large-scale data processing. These frameworks distribute computation across clusters of commodity hardware, achieving horizontal scalability through data partitioning. Processing logic expresses computation as transformations over immutable datasets.

The speed layer requires different characteristics: low-latency processing of individual events or small batches, support for incremental computation, and the ability to handle out-of-order data. Stream processing frameworks address these requirements through windowing operations, state management, and event-time processing.

Design Considerations

Lambda Architecture introduces significant complexity in exchange for specific benefits. Understanding when to adopt this architecture requires evaluating trade-offs against alternative approaches.

Consistency requirements determine Lambda Architecture's suitability. Applications needing strong consistency for recent data should consider alternative architectures. Lambda Architecture provides eventual consistency with a temporal gap between speed and batch layer results. Financial systems requiring immediate consistency for all transactions typically need different architectures. Analytics systems that tolerate temporary inconsistencies for recent data while requiring accuracy for historical analysis align well with Lambda Architecture.

Query latency expectations affect architectural decisions. The speed layer adds complexity specifically to reduce query latency for recent data. Applications where users tolerate batch processing delays (daily reports, historical analytics) may not benefit from the speed layer. Real-time dashboards, monitoring systems, and user-facing applications showing current activity justify the additional complexity.

Development and operational costs increase substantially with Lambda Architecture. Teams maintain separate codebases for batch and speed layers, often implementing similar logic in different frameworks. Operations teams manage multiple distributed systems with different operational characteristics. Organizations must weigh these costs against the value of low-latency queries over current data.

Data volume and velocity impact architectural choices. Lambda Architecture handles high data volumes through horizontal scaling in the batch layer. High-velocity data streams benefit from the speed layer's incremental processing. Systems with moderate data volumes and velocities may find simpler architectures sufficient. Extremely high velocities may push the speed layer's capabilities, requiring careful resource allocation.

Reprocessing requirements favor Lambda Architecture when historical data needs reanalysis with updated logic. The immutable master dataset enables full reprocessing without data migration. Systems where business logic evolves frequently and historical data must reflect new logic benefit from recomputation capabilities. Static analytical logic with infrequent changes reduces this advantage.

Kappa Architecture represents an alternative that eliminates the batch layer, processing all data through a single streaming pipeline. This simplification reduces operational complexity and eliminates code duplication. Kappa Architecture suits systems where stream processing frameworks can handle complete historical reprocessing, the speed layer alone meets accuracy requirements, and development resources are constrained. Lambda Architecture provides better separation of concerns when batch and streaming characteristics differ significantly.

Data freshness tolerance varies by use case. Lambda Architecture's speed layer typically maintains recent data (minutes to hours) while the batch layer handles historical data. Applications requiring sub-second freshness may need pure streaming architectures. Those tolerating hourly or daily freshness might not need a speed layer at all.

# Decision matrix for Lambda Architecture adoption
class ArchitectureDecision
  def recommend_lambda?(requirements)
    score = 0
    
    # Strong indicators for Lambda
    score += 2 if requirements[:data_volume] == :large
    score += 2 if requirements[:query_latency] == :low
    score += 2 if requirements[:reprocessing_needed] == :frequent
    score += 1 if requirements[:eventual_consistency] == :acceptable
    
    # Indicators against Lambda
    score -= 2 if requirements[:strong_consistency] == :required
    score -= 2 if requirements[:operational_complexity] == :limited
    score -= 1 if requirements[:team_size] == :small
    
    score > 3
  end
end

The architectural decision should account for team expertise. Lambda Architecture requires skills in both batch processing frameworks (MapReduce, Spark) and stream processing systems (Storm, Flink, Kafka Streams). Teams experienced in only one paradigm face a steeper learning curve.

Implementation Approaches

Lambda Architecture implementations vary based on technology choices, scale requirements, and organizational constraints. Several distinct approaches address different operational contexts.

Full-scale distributed implementation deploys separate clusters for batch and speed layers using enterprise-grade frameworks. The batch layer runs on Hadoop or Spark clusters processing terabytes or petabytes of data. The speed layer operates on Storm, Flink, or Kafka Streams clusters handling thousands of events per second. The serving layer uses distributed databases like Cassandra or HBase for indexed access to batch views, with in-memory stores like Redis for speed layer results.

This approach maximizes throughput and fault tolerance through horizontal scaling. Batch processing distributes across hundreds or thousands of nodes. Speed layer processing partitions across multiple stream processors. The serving layer replicates data across availability zones. Operations teams manage complex distributed systems with sophisticated monitoring, alerting, and deployment automation.

Organizations adopt this approach when data volumes exceed single-machine capabilities, query loads require distributed serving infrastructure, and operational expertise exists for managing multiple clusters. The implementation cost includes hardware or cloud infrastructure, specialized personnel, and ongoing operational overhead.

Hybrid cloud implementation splits components across different infrastructure providers based on cost and performance characteristics. Batch processing might run on commodity hardware or spot instances due to its fault-tolerant nature. Speed layer processing requires more reliable infrastructure with better SLAs. The serving layer might use managed database services to reduce operational burden.

# Configuration for hybrid deployment
class HybridArchitecture
  def initialize
    @batch_config = {
      provider: 'aws_spot',
      instance_type: 'compute_optimized',
      scaling: 'horizontal',
      cost_priority: 'minimize'
    }
    
    @speed_config = {
      provider: 'aws_on_demand',
      instance_type: 'memory_optimized',
      scaling: 'vertical_then_horizontal',
      latency_priority: 'minimize'
    }
    
    @serving_config = {
      provider: 'managed_service',
      database: 'dynamodb',
      caching: 'elasticache',
      availability: 'multi_region'
    }
  end
end

Simplified small-scale implementation reduces complexity for moderate data volumes. The batch layer might run on a single powerful server using in-memory processing frameworks. The speed layer could use lightweight stream processors or even cron jobs processing recent data incrementally. The serving layer might combine batch and speed results in a single database with appropriate indexing.

This approach suits organizations with limited operational resources, data volumes measured in gigabytes or low terabytes, and query loads manageable by single-server databases. Implementation complexity decreases substantially while maintaining Lambda Architecture's conceptual benefits.

Serverless implementation eliminates infrastructure management by using cloud-native services. Batch processing runs on AWS Lambda, Google Cloud Functions, or Azure Functions triggered periodically. Speed layer processing uses cloud stream processing services. The serving layer uses managed databases with auto-scaling capabilities.

# Serverless batch processor trigger
require 'aws-sdk-lambda'

class BatchTrigger
  def schedule_processing(data_partition)
    lambda = Aws::Lambda::Client.new
    
    lambda.invoke({
      function_name: 'batch_processor',
      invocation_type: 'Event',
      payload: {
        partition: data_partition,
        timestamp: Time.now.utc.iso8601
      }.to_json
    })
  end
end

This approach minimizes operational overhead and allows fine-grained cost optimization. Organizations pay only for actual computation time. The implementation works well for variable workloads with unpredictable patterns. Limitations include function execution time limits, memory constraints, and cold start latencies.

Micro-batch compromise implements both layers using the same streaming framework with different window sizes. Large windows (hours) provide batch-like processing. Small windows (seconds) provide real-time processing. This reduces code duplication while maintaining separate logical layers.

Spark Streaming's micro-batch architecture naturally fits this approach. Developers implement processing logic once, then configure different batch intervals for batch and speed layers. The serving layer merges results from different window sizes. This compromise sacrifices some theoretical separation but gains practical simplicity.

Ruby Implementation

Ruby's ecosystem includes tools for implementing Lambda Architecture components, though the language sees less adoption for large-scale batch processing compared to JVM languages. Ruby's strengths in API development and data transformation make it suitable for specific Lambda Architecture roles.

Batch layer processing in Ruby typically uses smaller-scale approaches or interfaces with existing batch processing systems. The Hadoop Streaming API allows Ruby scripts to process data in Hadoop clusters.

#!/usr/bin/env ruby
# Hadoop streaming mapper
STDIN.each_line do |line|
  user_id, timestamp, event_type, value = line.chomp.split(',')
  
  # Emit key-value pairs for reducer
  puts "#{user_id}\t#{event_type}:#{value}"
end

#!/usr/bin/env ruby
# Hadoop streaming reducer
require 'json'

current_key = nil
aggregates = Hash.new { |h, k| h[k] = [] }

STDIN.each_line do |line|
  key, value = line.chomp.split("\t")
  
  if key != current_key && current_key
    # Output aggregated results for previous key
    result = {
      user_id: current_key,
      events: aggregates
    }
    puts JSON.generate(result)
    aggregates.clear
  end
  
  current_key = key
  event_type, event_value = value.split(':')
  aggregates[event_type] << event_value.to_f
end

# Output final key
if current_key
  result = { user_id: current_key, events: aggregates }
  puts JSON.generate(result)
end

For smaller batch processing, Ruby excels at orchestrating workflows and data transformations. The parallel gem enables concurrent processing on multi-core systems.

require 'parallel'
require 'json'

class BatchProcessor
  def process_partition(partition_files)
    Parallel.map(partition_files, in_processes: 8) do |file|
      process_file(file)
    end
  end
  
  private
  
  def process_file(file)
    aggregates = Hash.new(0)
    
    File.foreach(file) do |line|
      data = JSON.parse(line)
      key = extract_key(data)
      aggregates[key] += data['value'].to_f
    end
    
    aggregates
  end
  
  def extract_key(data)
    "#{data['user_id']}:#{data['event_type']}"
  end
end

Speed layer processing suits Ruby better given the language's performance characteristics. The bunny gem provides RabbitMQ integration for message stream processing.

require 'bunny'
require 'json'
require 'redis'

class SpeedLayerProcessor
  def initialize
    @redis = Redis.new
    @connection = Bunny.new
    @connection.start
    @channel = @connection.create_channel
  end
  
  def process_stream(queue_name)
    queue = @channel.queue(queue_name, durable: true)
    
    queue.subscribe(block: true) do |delivery_info, properties, body|
      event = JSON.parse(body)
      process_event(event)
      @channel.ack(delivery_info.delivery_tag)
    end
  end
  
  private
  
  def process_event(event)
    key = "speed:#{event['user_id']}:#{event['event_type']}"
    
    # Increment counter with expiration
    @redis.multi do |redis|
      redis.incrbyfloat(key, event['value'].to_f)
      redis.expire(key, 3600) # Keep for 1 hour
    end
  end
end

The kafka-ruby gem integrates with Apache Kafka for higher-throughput streaming scenarios.

require 'kafka'
require 'json'

class KafkaSpeedLayer
  def initialize(brokers)
    @kafka = Kafka.new(brokers)
    @consumer = @kafka.consumer(group_id: 'speed-layer')
  end
  
  def process_topic(topic)
    @consumer.subscribe(topic)
    
    @consumer.each_message do |message|
      event = JSON.parse(message.value)
      
      # Process with windowing
      window_key = calculate_window(message.timestamp)
      update_window(window_key, event)
    end
  end
  
  private
  
  def calculate_window(timestamp)
    window_size = 300 # 5-minute windows
    (timestamp / window_size) * window_size
  end
  
  def update_window(window_key, event)
    # Update windowed aggregates
    key = "window:#{window_key}:#{event['user_id']}"
    # Implementation depends on storage backend
  end
end

Serving layer implementation leverages Ruby web frameworks. A Sinatra or Rails application provides the query API that merges batch and speed layer results.

require 'sinatra'
require 'redis'
require 'cassandra'

class ServingLayer < Sinatra::Base
  configure do
    set :redis, Redis.new
    set :cassandra, Cassandra.cluster.connect('analytics')
  end
  
  get '/user/:user_id/stats' do
    user_id = params[:user_id]
    
    # Query batch layer (Cassandra)
    batch_results = query_batch_layer(user_id)
    
    # Query speed layer (Redis)
    speed_results = query_speed_layer(user_id)
    
    # Merge results
    merged = merge_layers(batch_results, speed_results)
    
    json merged
  end
  
  private
  
  def query_batch_layer(user_id)
    statement = settings.cassandra.prepare(
      'SELECT event_type, SUM(value) as total 
       FROM user_aggregates 
       WHERE user_id = ? 
       GROUP BY event_type'
    )
    
    results = settings.cassandra.execute(statement, arguments: [user_id])
    
    results.each_with_object({}) do |row, hash|
      hash[row['event_type']] = row['total']
    end
  end
  
  def query_speed_layer(user_id)
    keys = settings.redis.keys("speed:#{user_id}:*")
    
    keys.each_with_object({}) do |key, hash|
      event_type = key.split(':').last
      value = settings.redis.get(key).to_f
      hash[event_type] = value
    end
  end
  
  def merge_layers(batch, speed)
    merged = batch.dup
    
    speed.each do |event_type, value|
      merged[event_type] = (merged[event_type] || 0) + value
    end
    
    merged
  end
end

Data ingestion pipeline in Ruby handles incoming data distribution to both batch and speed layers.

require 'kafka'
require 's3'

class DataIngestionPipeline
  def initialize(kafka_brokers, s3_bucket)
    @kafka = Kafka.new(kafka_brokers)
    @producer = @kafka.producer
    @s3 = S3::Service.new(access_key_id: ENV['AWS_ACCESS_KEY'],
                          secret_access_key: ENV['AWS_SECRET_KEY'])
    @bucket = @s3.buckets.find(s3_bucket)
  end
  
  def ingest(event)
    timestamp = Time.now.utc
    
    # Send to speed layer (Kafka)
    @producer.produce(
      event.to_json,
      topic: 'events',
      partition_key: event['user_id']
    )
    
    # Append to batch layer (S3)
    append_to_batch_storage(event, timestamp)
    
    @producer.deliver_messages
  end
  
  private
  
  def append_to_batch_storage(event, timestamp)
    partition_path = "events/year=#{timestamp.year}/" \
                    "month=#{timestamp.month}/" \
                    "day=#{timestamp.day}/" \
                    "hour=#{timestamp.hour}/" \
                    "#{timestamp.to_i}-#{SecureRandom.uuid}.json"
    
    object = @bucket.objects.build(partition_path)
    object.content = event.to_json
    object.save
  end
end

Performance Considerations

Lambda Architecture's performance characteristics span multiple dimensions: batch throughput, stream latency, query response time, and resource utilization. Each layer exhibits different performance properties requiring distinct optimization approaches.

Batch layer throughput determines how quickly comprehensive views refresh. Processing time grows with dataset size but should scale sub-linearly through parallelization. A well-implemented batch layer processes terabytes of data in hours rather than days.

Data partitioning strategy significantly impacts batch performance. Partitioning by time (year, month, day) enables incremental processing where only new partitions require computation. Partitioning by key (user ID, region) distributes computation evenly across nodes. Hybrid partitioning schemes combine temporal and key-based approaches.

# Partition strategy impacts batch performance
class PartitionStrategy
  def time_based_partitions(start_date, end_date)
    (start_date..end_date).map do |date|
      "data/year=#{date.year}/month=#{date.month}/day=#{date.day}"
    end
  end
  
  def key_based_partitions(key, num_partitions)
    partition_id = Digest::MD5.hexdigest(key.to_s).to_i(16) % num_partitions
    "data/partition=#{partition_id}"
  end
  
  def hybrid_partitions(date, key, num_partitions)
    partition_id = Digest::MD5.hexdigest(key.to_s).to_i(16) % num_partitions
    "data/year=#{date.year}/month=#{date.month}/day=#{date.day}/partition=#{partition_id}"
  end
end

Compression reduces storage and I/O costs while increasing CPU utilization. Modern codecs like Snappy or LZ4 balance compression ratios with CPU overhead. Column-oriented formats like Parquet achieve high compression for analytical workloads through encoding schemes optimized for repeated values.

Speed layer latency measures the delay between event arrival and query availability. Micro-batch streaming (Spark Streaming) introduces latency equal to batch intervals, typically seconds. True streaming systems (Flink, Storm) process events individually with millisecond latencies.

State management dominates speed layer performance. Systems maintaining large amounts of state (windowed aggregates, session data) require efficient state backends. In-memory state provides lowest latency but limits scale. Disk-backed state (RocksDB) enables larger state sizes with higher latency. Distributed state stores sacrifice latency for horizontal scaling.

# State management strategies for speed layer
class StateManager
  def initialize(backend: :memory)
    @backend = create_backend(backend)
  end
  
  def update_window(window_key, event)
    current = @backend.get(window_key) || {}
    updated = merge_event(current, event)
    @backend.put(window_key, updated)
  end
  
  private
  
  def create_backend(type)
    case type
    when :memory
      InMemoryBackend.new # Fast, limited capacity
    when :rocksdb
      RocksDBBackend.new # Larger capacity, slower
    when :redis
      RedisBackend.new # Distributed, network overhead
    end
  end
end

Windowing granularity affects speed layer throughput. Fine-grained windows (1-second) generate more state updates. Coarse-grained windows (5-minute) reduce update frequency but increase staleness. Sliding windows require more computation than tumbling windows due to overlapping calculations.

Query performance depends on serving layer implementation. Indexed databases (Cassandra, HBase) provide sub-second lookups for point queries. Range queries perform worse, requiring multiple index lookups. Pre-aggregated views trade write-time computation for read-time performance.

The merge operation combining batch and speed results adds overhead to every query. Minimizing merge complexity improves query latency. Simple addition (sum of counters) merges efficiently. Complex operations (median, percentile) require more computation.

# Query merge optimization
class QueryMerger
  # Fast merge for simple aggregates
  def merge_sums(batch_sum, speed_sum)
    batch_sum + speed_sum
  end
  
  # Slower merge for complex aggregates
  def merge_percentiles(batch_values, speed_values)
    combined = batch_values + speed_values
    combined.sort[combined.length * 0.95]
  end
end

Caching frequently accessed queries reduces load on both batch and speed layers. A caching layer (Redis, Memcached) in front of the serving layer handles repeated queries with microsecond latencies. Cache invalidation strategies must account for batch layer updates and speed layer freshness requirements.

Resource utilization patterns differ across layers. The batch layer exhibits periodic high utilization during processing runs and low utilization between runs. Elastic scaling matches compute resources to processing schedules. The speed layer requires sustained resource allocation to handle continuous event streams. The serving layer experiences variable load following query patterns.

Cost optimization strategies exploit these patterns. Batch processing uses spot instances or preemptible VMs for cost reduction. Speed layer processing requires reliable instances with SLA guarantees. The serving layer benefits from auto-scaling groups responding to query load variations.

Data locality optimizations reduce network transfer costs. Co-locating batch processing with storage (HDFS, S3) minimizes data movement. Processing frameworks schedule tasks on nodes containing required data partitions. The speed layer processes events near their source, reducing ingestion latency.

Real-World Applications

Lambda Architecture deployments span diverse industries and use cases, each adapting the architecture to specific requirements and constraints.

Real-time analytics platforms represent the canonical Lambda Architecture application. Companies process clickstream data from web and mobile applications, computing user engagement metrics, conversion funnels, and behavioral cohorts. The batch layer reprocesses complete history to maintain accurate lifetime value calculations and cross-session analysis. The speed layer provides current session metrics and real-time dashboard updates.

A social media platform processes billions of events daily. The batch layer runs nightly Spark jobs computing user follower counts, post engagement rates, and content recommendations from the complete dataset. The speed layer maintains current counts using Kafka Streams, updating within seconds as users interact. The serving layer merges historical trends with current activity for profile pages and analytics dashboards.

# Real-time analytics query merging
class AnalyticsPlatform
  def user_metrics(user_id)
    # Batch metrics: complete history, refreshed nightly
    batch_metrics = cassandra.execute(
      'SELECT * FROM user_metrics WHERE user_id = ?',
      user_id
    ).first
    
    # Speed metrics: last 24 hours, updated in real-time
    speed_metrics = redis.hgetall("metrics:#{user_id}:realtime")
    
    {
      total_posts: batch_metrics['total_posts'] + speed_metrics['recent_posts'].to_i,
      total_likes: batch_metrics['total_likes'] + speed_metrics['recent_likes'].to_i,
      followers: batch_metrics['followers'], # Updated nightly
      current_session_time: speed_metrics['session_time'].to_i
    }
  end
end

Fraud detection systems use Lambda Architecture to balance detection accuracy with response time. The batch layer trains machine learning models on complete transaction history, identifying patterns across months or years. The speed layer applies lightweight rules to incoming transactions, flagging suspicious activity within milliseconds. The serving layer combines model scores with real-time rule violations for comprehensive risk assessment.

Financial institutions process millions of transactions hourly. The batch layer executes complex graph analysis detecting fraud rings and multi-account schemes. The speed layer evaluates individual transactions against velocity rules and blocklists. Both layers feed a serving layer that determines whether to approve, decline, or review transactions.

IoT sensor networks generate continuous data streams requiring both real-time monitoring and historical analysis. The batch layer processes complete sensor histories to identify long-term trends, equipment degradation patterns, and seasonal effects. The speed layer monitors current readings for anomaly detection and threshold violations. The serving layer provides operational dashboards showing both real-time status and historical context.

A manufacturing facility instruments thousands of sensors across production equipment. The batch layer analyzes vibration patterns, temperature cycles, and performance metrics to predict maintenance needs. The speed layer alerts operators immediately when sensors exceed safety thresholds. Engineers access the serving layer for dashboards combining current equipment status with maintenance predictions.

Content recommendation engines balance personalization accuracy with freshness. The batch layer computes collaborative filtering models from complete user interaction histories. These models identify long-term preferences and cross-user patterns. The speed layer incorporates recent user actions—clicks, purchases, ratings—to adjust recommendations immediately. The serving layer combines stable base recommendations with real-time preference updates.

Video streaming platforms process viewing histories for millions of users. The batch layer runs matrix factorization algorithms overnight, generating base recommendations. The speed layer tracks current viewing sessions, adjusting recommendations based on what users watch now. The serving layer merges both for homepage recommendation lists that reflect both stable preferences and current interests.

Log aggregation and analysis platforms ingest application logs, metrics, and events from distributed systems. The batch layer indexes complete logs in searchable formats, enabling complex historical queries. The speed layer maintains recent logs in memory for fast access to current system state. The serving layer provides unified search across both layers, automatically routing queries based on time ranges.

Operations teams query logs to debug production issues. Recent logs (last hour) come from the speed layer with sub-second response times. Historical queries (last week) access the batch layer with seconds of latency. The serving layer presents a unified interface, automatically determining which layer to query based on requested time ranges.

# Log query routing based on time range
class LogServingLayer
  SPEED_LAYER_THRESHOLD = 3600 # 1 hour
  
  def query_logs(filters)
    time_range = filters[:end_time] - filters[:start_time]
    cutoff = Time.now.utc - SPEED_LAYER_THRESHOLD
    
    if filters[:start_time] >= cutoff
      # All recent data, use speed layer only
      query_speed_layer(filters)
    elsif filters[:end_time] < cutoff
      # All historical data, use batch layer only
      query_batch_layer(filters)
    else
      # Spans both layers, query and merge
      speed_results = query_speed_layer(
        filters.merge(start_time: cutoff)
      )
      batch_results = query_batch_layer(
        filters.merge(end_time: cutoff)
      )
      merge_results(batch_results, speed_results)
    end
  end
end

Reference

Architecture Components

Component	Responsibility	Update Frequency	Consistency
Batch Layer	Complete recomputation from raw data	Hours to days	Strongly consistent
Speed Layer	Incremental computation on recent data	Seconds to minutes	Eventually consistent
Serving Layer	Index batch views, merge with speed views	Continuous	Mixed consistency
Master Dataset	Immutable append-only raw data storage	Real-time appends	Strongly consistent

Layer Characteristics

Characteristic	Batch Layer	Speed Layer
Data Volume	Complete dataset	Recent data only
Latency	High (hours)	Low (seconds)
Throughput	Very high	Moderate
Fault Tolerance	Recomputation	State checkpointing
Complexity	Moderate	High
Resource Usage	Periodic spikes	Sustained

Processing Patterns

Pattern	Batch Implementation	Speed Implementation	Use Case
Aggregation	MapReduce, Spark	Windowed streams	Counters, sums
Joins	Hash joins, sort-merge	Stream-stream, stream-table	Data enrichment
Filtering	Predicate pushdown	Event filtering	Data selection
Grouping	Shuffle operations	Key-based partitioning	Category analysis
Sorting	Distributed sort	Per-window sorting	Rankings, top-N

Technology Stack Options

Layer	JVM Ecosystem	Alternative Options	Ruby Integration
Batch	Hadoop, Spark, Hive	Dask, Ray	Hadoop Streaming, JRuby Spark
Speed	Storm, Flink, Kafka Streams	Pulsar, Akka Streams	kafka-ruby, bunny
Serving	Cassandra, HBase, Druid	PostgreSQL, MongoDB, DynamoDB	cassandra-driver, mongo, aws-sdk
Message Queue	Kafka, Pulsar	RabbitMQ, Amazon Kinesis	kafka-ruby, bunny, aws-sdk-kinesis
Storage	HDFS, S3, Azure Blob	Google Cloud Storage, MinIO	aws-sdk-s3, azure-storage-blob

Query Merge Strategies

Aggregate Type	Merge Operation	Complexity	Accuracy Trade-off
Sum	Addition	O(1)	None
Count	Addition	O(1)	None
Average	Weighted average	O(1)	None
Min/Max	Comparison	O(1)	None
Median	Re-sort	O(n log n)	Approximation possible
Percentile	Re-compute	O(n log n)	Approximation recommended
Distinct Count	HyperLogLog union	O(1)	Approximate
Set Union	Set merge	O(n)	None

Performance Tuning Parameters

Parameter	Typical Range	Impact	Trade-off
Batch Interval	1-24 hours	Freshness vs resource usage	Longer intervals reduce cost
Speed Window Size	1-300 seconds	Latency vs accuracy	Smaller windows increase overhead
Partition Count	10-1000	Parallelism vs overhead	More partitions increase coordination
Replication Factor	2-5	Fault tolerance vs cost	Higher replication improves availability
Cache TTL	1-300 seconds	Freshness vs load	Longer TTL reduces backend queries

Data Partitioning Schemes

Scheme	Batch Layer	Speed Layer	Query Pattern
Time-based	Year/month/day	Tumbling windows	Range queries
Hash-based	Hash(key) mod N	Partition key	Point lookups
Range-based	Key ranges	Key groups	Range scans
Composite	Time + hash	Time + partition	Mixed queries

Failure Scenarios

Failure Type	Batch Layer Response	Speed Layer Response	Recovery Time
Node failure	Reschedule task	Reprocess from checkpoint	Minutes
Network partition	Continue with available nodes	Buffer events	Seconds to minutes
Logic bug	Reprocess with fixed code	Deploy fix, recompute windows	Hours to days
Data corruption	Recompute from source	Clear state, reprocess	Hours
Storage failure	Replica failover	Checkpoint restoration	Minutes

Cost Optimization Strategies

Strategy	Batch Layer	Speed Layer	Serving Layer
Resource Type	Spot instances	On-demand instances	Reserved instances
Scaling	Elastic scaling	Auto-scaling	Dynamic scaling
Compression	High ratio codecs	Low latency codecs	Query-optimized formats
Data Lifecycle	Archive to cold storage	Short retention	Tiered storage
Reprocessing	Incremental when possible	Stateless when possible	Cache frequently accessed

Lambda Architecture