CrackedRuby - Polyglot Persistence

Overview

Polyglot persistence describes an architectural pattern where applications use multiple specialized database technologies rather than a single general-purpose database. Each data store handles specific types of data and access patterns according to its strengths. A typical implementation might combine a relational database for transactional data, a document store for flexible schemas, a key-value store for caching, and a graph database for relationship-heavy data.

The term emerged from the polyglot programming movement, which advocates using multiple programming languages within a system. Martin Fowler and Pramod Sadalage popularized polyglot persistence in the early 2010s as NoSQL databases matured and became production-ready. The pattern addresses limitations of the "one size fits all" database approach that dominated previous decades.

Traditional applications typically standardize on a single database technology and force all data into that model. This creates impedance mismatches where the database's capabilities don't align with specific data requirements. A relational database handles hierarchical document structures awkwardly. A document database performs poorly for highly connected graph data. A traditional SQL database lacks the performance characteristics needed for high-volume caching.

Polyglot persistence solves these mismatches by selecting appropriate storage technologies for each data type. An e-commerce application might store product catalogs in a document database for flexible attributes, order transactions in a relational database for ACID guarantees, session data in a memory cache for fast access, and product recommendations in a graph database for relationship queries. Each technology operates at peak efficiency for its designated purpose.

The pattern requires careful boundary definition between data stores. Applications must determine which data belongs in which system and how to maintain consistency across boundaries. This architectural complexity trades off against the performance gains and modeling improvements from specialized databases.

# Example: E-commerce application with multiple data stores
class Product
  include Mongoid::Document  # Document database for flexible schema
  field :name, type: String
  field :attributes, type: Hash
end

class Order
  # ActiveRecord for relational integrity
  belongs_to :user
  has_many :line_items
end

class RecommendationGraph
  # Neo4j for relationship queries
  include Neo4j::ActiveNode
  has_many :out, :recommends, model_class: :Product
end

class SessionCache
  # Redis for fast key-value access
  def self.get(session_id)
    REDIS.get("session:#{session_id}")
  end
end

Key Principles

Bounded Contexts and Data Ownership

Each database owns specific data domains with clear boundaries. The concept derives from Domain-Driven Design where bounded contexts define areas of responsibility. A product catalog context might own product data in MongoDB while an order processing context owns transactional data in PostgreSQL. These boundaries prevent overlapping ownership and clarify which system serves as the source of truth for each data type.

Data ownership extends beyond storage to include all operations on that data. The owning context provides the canonical interface for accessing and modifying its data. Other contexts access this data through well-defined APIs rather than direct database queries. This encapsulation allows the owning context to change its internal storage technology without affecting consumers.

Data Model Alignment

Each database type optimizes for specific data models and access patterns. Relational databases excel at structured data with strong relationships and complex queries. Document databases handle semi-structured data with varying schemas. Key-value stores provide fast simple lookups. Graph databases traverse complex relationships efficiently. Column-family stores optimize for analytical queries across large datasets.

Selecting storage technology starts with analyzing the data's natural structure and primary access patterns. User profile data with flexible attributes fits document storage. Shopping cart data requiring atomic updates needs transactional storage. Product view counts requiring high-write throughput suit column-family storage. Social network connections demanding relationship traversal need graph storage.

Consistency Models

Polyglot persistence often requires accepting eventual consistency between data stores. Strong consistency across distributed systems requires coordination that limits performance and availability. Many polyglot architectures adopt eventual consistency where updates propagate asynchronously between systems and temporary inconsistencies are acceptable.

Event-driven architectures enable eventual consistency by publishing changes as events that other systems consume. When an order completes in the transactional database, an event triggers updates to the search index, cache invalidation, and analytics store. Each system processes events at its own pace, reaching consistency eventually rather than immediately.

Some data requires strong consistency despite using multiple stores. Critical operations like financial transactions might use distributed transaction protocols or saga patterns to maintain consistency. The decision between strong and eventual consistency depends on business requirements rather than technical convenience.

Technology Selection Criteria

Choosing databases involves evaluating multiple factors: data model fit, query patterns, consistency requirements, scalability needs, operational complexity, and team expertise. The data model should match the database's native structure without significant transformation. Query patterns should align with the database's optimization strengths. Consistency requirements must fit within the database's guarantees.

Operational considerations often outweigh technical features. A technically superior database that the team cannot operate effectively provides less value than a familiar technology with adequate capabilities. Infrastructure compatibility, monitoring tools, backup procedures, and disaster recovery processes affect the total cost of ownership.

Inter-Store Communication

Systems using polyglot persistence need mechanisms for sharing data between stores. Direct database queries create tight coupling and violate bounded context principles. API-based communication through services provides better encapsulation but adds latency and complexity. Event streaming enables loose coupling and temporal decoupling but requires eventual consistency acceptance.

# Event-driven synchronization between stores
class OrderService
  def complete_order(order_id)
    order = Order.find(order_id)
    order.update!(status: 'completed')
    
    # Publish event for other systems
    EventBus.publish('order.completed', {
      order_id: order.id,
      user_id: order.user_id,
      total: order.total,
      items: order.line_items.map(&:product_id)
    })
  end
end

class SearchIndexUpdater
  def handle_order_completed(event)
    # Update Elasticsearch with purchase data
    product_ids = event[:items]
    products = Product.where(id: product_ids)
    products.each do |product|
      SearchIndex.update_popularity(product.id)
    end
  end
end

Design Considerations

Complexity Cost Analysis

Polyglot persistence adds significant operational and developmental complexity. Each additional database technology requires installation, configuration, monitoring, backup, and maintenance procedures. Development teams need expertise across multiple database systems. Troubleshooting issues spans multiple technologies. Schema migrations affect multiple systems. Disaster recovery procedures grow more complex.

This complexity cost must offset with concrete benefits. Applications gain value from polyglot persistence when specific data access patterns perform significantly better with specialized databases, when data models fit poorly into a single database type, or when scalability requirements exceed what a single database can provide. Small applications with simple data requirements rarely benefit from the added complexity.

The break-even point typically occurs in medium to large applications with distinct data domains. An application with clear separation between transactional data, search indexes, caching layers, and analytics stores benefits from specialized databases for each domain. Applications with homogeneous data models and simple access patterns should avoid unnecessary complexity.

Operational Overhead

Running multiple database technologies multiplies operational responsibilities. Each database needs monitoring for performance metrics, capacity planning, backup verification, security patching, and upgrade procedures. Database administrators need expertise across technologies or organizations need specialists for each system. On-call rotation must cover multiple database types.

Cloud-managed database services reduce operational overhead by handling infrastructure management, backups, patches, and basic monitoring. However, managed services introduce dependencies on cloud providers, limit configuration options, and add per-service costs. Organizations must balance operational simplicity against control and cost.

Infrastructure as code and automation reduce operational burden by codifying database configurations, backup procedures, and deployment processes. Treating infrastructure as code enables consistent deployments, simplifies disaster recovery, and reduces manual errors. Investment in automation pays dividends as the number of databases grows.

Data Consistency Trade-offs

Distributed data stores introduce consistency challenges absent in single-database systems. Strong consistency across databases requires distributed transactions or two-phase commit protocols that limit availability and performance. Most polyglot architectures accept eventual consistency for better availability and performance.

Eventual consistency requires applications to handle temporarily inconsistent data. A user might see old cached data before updates propagate. Search results might lag behind database changes. Analytics dashboards might show stale metrics. These inconsistencies are acceptable in many contexts but problematic for critical operations.

Critical data paths need strong consistency mechanisms. Financial transactions cannot accept eventual consistency for balance updates. Inventory management must prevent overselling through strong consistency. User authentication requires consistent credential verification. These requirements constrain where polyglot persistence applies or require additional coordination mechanisms.

Query Capabilities

Distributing data across databases limits query capabilities. Cross-database joins become impossible without extracting data from multiple sources and joining in application code. Transactions cannot span databases without distributed transaction protocols. Foreign key constraints don't enforce referential integrity across stores.

Applications must design data models to minimize cross-database queries. Denormalization duplicates data to avoid joins. Materialized views precompute aggregations. Caching stores frequently accessed combinations. These techniques trade storage space and update complexity for query performance.

Some queries remain impractical across database boundaries. Complex analytical queries spanning multiple data stores might require a separate data warehouse that consolidates data periodically. Real-time dashboards might need specialized systems that consume change streams from multiple sources.

Migration and Evolution

Systems using polyglot persistence face complex migration challenges. Database schema changes must coordinate across technologies. Adding new databases requires data migration from existing stores. Changing database technology involves migrating data while maintaining service availability.

Backward compatibility becomes critical during migrations. Applications must support both old and new database schemas during transition periods. Dual-write strategies maintain data in multiple stores temporarily. Feature flags enable gradual rollout of database changes.

Implementation Approaches

Microservices with Database per Service

The microservices pattern assigns each service its own database, implementing strong service boundaries through data isolation. Each service selects database technology matching its requirements without affecting other services. An inventory service might use PostgreSQL, a product catalog service MongoDB, a recommendations service Neo4j, and a session service Redis.

Service boundaries prevent direct database access from other services. All data access occurs through service APIs. This encapsulation allows services to change internal storage without external impact. Services own their data models and enforce business rules consistently.

Event-driven communication coordinates data changes across services. When a service modifies its data, it publishes events that other services consume. A completed order in the order service triggers inventory updates, shipping notifications, and analytics updates through events. This eventual consistency model scales well but requires careful event design.

Data duplication occurs naturally in this pattern. Services maintain local copies of data they frequently access. A shipping service caches product dimensions from the product catalog. A recommendation service stores user preferences from the user service. This denormalization improves performance at the cost of consistency complexity.

# Microservices with separate databases
class InventoryService
  def initialize
    @db = Sequel.connect('postgres://inventory-db')
  end
  
  def reserve_stock(product_id, quantity)
    @db.transaction do
      stock = @db[:stock].where(product_id: product_id).first
      raise OutOfStock if stock[:available] < quantity
      
      @db[:stock].where(product_id: product_id)
        .update(available: stock[:available] - quantity)
      
      publish_event('stock.reserved', {
        product_id: product_id,
        quantity: quantity
      })
    end
  end
end

class ProductCatalogService
  def initialize
    @client = Mongo::Client.new(['mongodb://catalog-db:27017'])
  end
  
  def get_product(product_id)
    @client[:products].find(_id: product_id).first
  end
  
  def update_product(product_id, attributes)
    @client[:products].update_one(
      { _id: product_id },
      { '$set' => attributes }
    )
    
    publish_event('product.updated', {
      product_id: product_id,
      changes: attributes
    })
  end
end

Single Application with Multiple Data Stores

Monolithic applications can implement polyglot persistence by using multiple databases within a single codebase. The application code manages connections to different databases and routes operations to appropriate stores. This approach provides polyglot benefits without microservices complexity.

Domain modules or packages encapsulate database-specific logic. A product module interacts with MongoDB. An order module uses PostgreSQL. A cache module wraps Redis. Module boundaries enforce separation even within a single application. This organization prevents database logic from spreading throughout the codebase.

Connection management becomes critical in this pattern. Connection pools must be sized appropriately for each database. Timeout configurations need tuning. Health checks monitor each database's availability. The application must handle partial failures where some databases remain available while others fail.

Transaction management grows complex when operations span databases. Two-phase commit provides strong consistency but limits scalability. Saga patterns implement distributed transactions through coordinated local transactions with compensating actions. Applications must decide which operations require cross-database consistency and implement appropriate mechanisms.

Cache-Aside with Primary Database

The cache-aside pattern augments a primary database with a fast cache layer, often using Redis or Memcached. Applications read from cache first, falling back to the database on cache misses. Writes go to the database with explicit cache invalidation. This hybrid approach provides polyglot persistence benefits with minimal complexity.

Cache invalidation strategies determine consistency characteristics. Write-through caching updates cache and database simultaneously, maintaining consistency at the cost of write performance. Write-behind caching updates cache immediately and asynchronously updates the database, improving write performance but risking data loss. Explicit invalidation removes cache entries on writes, accepting temporary staleness for simplicity.

Cache warming strategies preload frequently accessed data. Applications can proactively populate cache during startup or low-traffic periods. Background jobs refresh cache entries before expiration. Warming reduces cache miss rates and improves performance consistency.

class UserRepository
  def initialize
    @db = ActiveRecord::Base.connection
    @cache = Redis.new
  end
  
  def find_user(user_id)
    cache_key = "user:#{user_id}"
    
    # Try cache first
    cached = @cache.get(cache_key)
    return JSON.parse(cached) if cached
    
    # Fall back to database
    user = User.find(user_id)
    @cache.setex(cache_key, 3600, user.to_json)
    user
  end
  
  def update_user(user_id, attributes)
    user = User.find(user_id)
    user.update!(attributes)
    
    # Invalidate cache
    @cache.del("user:#{user_id}")
    user
  end
end

Event Sourcing with Multiple Projections

Event sourcing stores all state changes as events in an append-only log. Multiple database projections rebuild state from events, each optimized for specific query patterns. An event store might use PostgreSQL or specialized event databases. Projections might use Elasticsearch for search, PostgreSQL for relational queries, and Redis for caching.

Events provide the single source of truth. Projections represent derived state that can be rebuilt from events. This separation allows adding new projections without changing existing systems. A new analytics requirement can consume historical events to build its database without affecting current operations.

Event replay enables projection rebuilding. Corrupted projections can be deleted and reconstructed from events. Schema changes in projections don't require data migration; the projection rebuilds with the new schema. This flexibility simplifies database evolution.

Projection lag introduces eventual consistency. Projections process events asynchronously and lag behind the event store. Applications must handle queries against possibly stale projections. Some systems provide consistency checks or query the event store directly for strong consistency requirements.

Ruby Implementation

Connection Management

Ruby applications manage multiple database connections through configuration and connection pooling. Rails applications define connections in database.yml for multiple databases. Standalone applications configure connections directly using database-specific clients.

# Rails database.yml with multiple databases
# config/database.yml
production:
  primary:
    adapter: postgresql
    database: app_production
    pool: 25
    
  catalog:
    adapter: mongodb
    database: products
    hosts:
      - mongodb1.example.com:27017
      
  cache:
    adapter: redis
    url: redis://cache.example.com:6379/0
    
  search:
    adapter: elasticsearch
    hosts:
      - search1.example.com:9200

Rails 6+ provides built-in support for multiple databases through the connects_to directive. Models declare which database they use. Connection switching happens automatically based on model class.

class ApplicationRecord < ActiveRecord::Base
  self.abstract_class = true
  connects_to database: { writing: :primary, reading: :primary }
end

class Product < ApplicationRecord
  connects_to database: { writing: :catalog, reading: :catalog }
end

# Explicit connection switching
ActiveRecord::Base.connected_to(role: :reading) do
  # Database reads use read replica
  User.find(123)
end

Connection pooling prevents connection exhaustion. Each database needs appropriate pool sizing based on expected concurrency. Rails uses thread-safe connection pools that automatically check out connections per thread.

# Custom connection pool configuration
class CacheConnection
  def initialize
    @redis_pool = ConnectionPool.new(size: 50, timeout: 5) do
      Redis.new(url: ENV['REDIS_URL'])
    end
  end
  
  def with_connection
    @redis_pool.with do |conn|
      yield conn
    end
  end
end

class CacheRepository
  def self.get(key)
    CACHE_POOL.with_connection do |redis|
      redis.get(key)
    end
  end
end

Repository Pattern

The repository pattern encapsulates database-specific logic behind a common interface. Repositories handle connections, queries, and data transformation. This abstraction allows changing database implementations without affecting business logic.

class UserRepository
  def find(id)
    # PostgreSQL for user data
    User.find(id)
  end
  
  def find_with_preferences(id)
    # Join data from multiple sources
    user = User.find(id)
    preferences = PreferenceCache.get("user:#{id}:prefs")
    user.preferences = preferences
    user
  end
  
  def search(query)
    # Elasticsearch for full-text search
    SEARCH_CLIENT.search(
      index: 'users',
      body: { query: { match: { name: query } } }
    )
  end
end

class ProductRepository
  def find(id)
    # MongoDB for product catalog
    Product.find(id)
  end
  
  def find_with_recommendations(id)
    product = Product.find(id)
    
    # Neo4j for graph-based recommendations
    recommendations = NEO4J_SESSION.query(
      "MATCH (p:Product {id: $id})-[:RECOMMENDS]->(r:Product) RETURN r",
      id: id
    )
    
    product.recommendations = recommendations
    product
  end
end

Data Synchronization

Keeping data synchronized across databases requires event publishing and subscription mechanisms. Ruby applications commonly use message queues or event buses for asynchronous communication.

# Event publishing with Kafka
class OrderService
  def complete_order(order)
    ActiveRecord::Base.transaction do
      order.update!(status: 'completed', completed_at: Time.current)
      
      # Publish to Kafka
      KAFKA_PRODUCER.produce(
        JSON.generate({
          event_type: 'order.completed',
          order_id: order.id,
          user_id: order.user_id,
          total: order.total,
          timestamp: Time.current.iso8601
        }),
        topic: 'orders'
      )
    end
    
    KAFKA_PRODUCER.deliver_messages
  end
end

# Event consumption and projection updates
class OrderCompletedHandler
  def process(message)
    event = JSON.parse(message.value)
    
    # Update search index
    update_search_index(event)
    
    # Update cache
    invalidate_user_cache(event['user_id'])
    
    # Update analytics
    record_analytics(event)
  end
  
  private
  
  def update_search_index(event)
    SEARCH_CLIENT.update(
      index: 'orders',
      id: event['order_id'],
      body: { doc: { status: 'completed' } }
    )
  end
  
  def invalidate_user_cache(user_id)
    REDIS.del("user:#{user_id}:orders")
  end
  
  def record_analytics(event)
    CLICKHOUSE_CLIENT.execute(
      "INSERT INTO order_events VALUES (?, ?, ?, ?)",
      event['order_id'],
      event['user_id'],
      event['total'],
      event['timestamp']
    )
  end
end

Transaction Coordination

Operations spanning multiple databases need coordination mechanisms. Ruby applications implement saga patterns for distributed transactions with compensating actions.

class TransferInventorySaga
  def execute(product_id, from_warehouse, to_warehouse, quantity)
    # Step 1: Reserve stock at destination
    destination_reservation = reserve_stock(to_warehouse, product_id, quantity)
    
    # Step 2: Transfer from source
    begin
      transfer_from_source(from_warehouse, product_id, quantity)
    rescue StandardError => e
      # Compensate: release destination reservation
      release_reservation(destination_reservation)
      raise
    end
    
    # Step 3: Confirm at destination
    begin
      confirm_reservation(destination_reservation)
    rescue StandardError => e
      # Compensate: return stock to source
      return_to_source(from_warehouse, product_id, quantity)
      raise
    end
    
    # Step 4: Update search index
    update_warehouse_index(product_id)
  end
  
  private
  
  def reserve_stock(warehouse_id, product_id, quantity)
    POSTGRES_CONN.exec_params(
      "UPDATE warehouse_stock SET reserved = reserved + $1 
       WHERE warehouse_id = $2 AND product_id = $3
       RETURNING id",
      [quantity, warehouse_id, product_id]
    )
  end
  
  def transfer_from_source(warehouse_id, product_id, quantity)
    POSTGRES_CONN.exec_params(
      "UPDATE warehouse_stock SET available = available - $1
       WHERE warehouse_id = $2 AND product_id = $3
       AND available >= $1",
      [quantity, warehouse_id, product_id]
    )
  end
  
  def update_warehouse_index(product_id)
    SEARCH_CLIENT.update(
      index: 'inventory',
      id: product_id,
      body: { doc: { last_updated: Time.current } }
    )
  end
end

Tools & Ecosystem

Database Adapters

Ruby provides adapters for connecting to various database technologies. ActiveRecord adapters integrate relational databases. Specialized gems handle NoSQL databases, caches, and search engines.

Relational database adapters include pg for PostgreSQL, mysql2 for MySQL, and sqlite3 for SQLite. These adapters integrate with ActiveRecord and provide raw connection interfaces. Non-Rails applications use Sequel as an alternative database toolkit supporting multiple relational databases.

Document database support comes through mongoid for MongoDB and cequel for Cassandra. These gems provide ActiveModel-compatible interfaces with database-specific features. Graph database access uses neo4j-ruby-driver for Neo4j connections.

Key-value store access typically uses redis-rb for Redis and dalli for Memcached. These lightweight clients handle caching operations efficiently. Search integration uses elasticsearch-ruby for Elasticsearch and searchkick for high-level search functionality.

Connection Pool Libraries

Connection pooling prevents resource exhaustion when managing multiple databases. The connection_pool gem provides thread-safe connection pools for any resource.

# Connection pooling for Redis
REDIS_POOL = ConnectionPool.new(size: 50, timeout: 5) do
  Redis.new(url: ENV['REDIS_URL'])
end

# Connection pooling for MongoDB
MONGO_POOL = ConnectionPool.new(size: 25, timeout: 5) do
  Mongo::Client.new(ENV['MONGODB_URL'])
end

# Using pooled connections
REDIS_POOL.with do |redis|
  redis.get('key')
end

Event Processing Frameworks

Event-driven architectures require message queue integration. Popular Ruby options include ruby-kafka for Apache Kafka, bunny for RabbitMQ, and aws-sdk-sqs for Amazon SQS.

Wisper provides an in-process event bus for simpler event handling within applications. Event sourcing implementations use rails_event_store or eventide for managing event streams.

# Event processing with ruby-kafka
kafka = Kafka.new(['kafka://localhost:9092'])
producer = kafka.producer

# Publishing events
producer.produce({
  event_type: 'user.created',
  user_id: 123
}.to_json, topic: 'users')
producer.deliver_messages

# Consuming events
consumer = kafka.consumer(group_id: 'user-indexer')
consumer.subscribe('users')

consumer.each_message do |message|
  event = JSON.parse(message.value)
  update_search_index(event)
end

Database Abstraction Libraries

ROM (Ruby Object Mapper) provides database abstraction supporting multiple database types. ROM separates data mapping from persistence, enabling polyglot persistence with consistent patterns.

# ROM configuration for multiple databases
require 'rom'

ROM.container(:sql, 'postgres://localhost/users') do |config|
  config.default.connection.create_table(:users) do
    primary_key :id
    column :name, String
  end
  
  config.relation(:users) do
    schema(infer: true)
  end
end

ROM.container(:mongodb, 'mongodb://localhost/products') do |config|
  config.default.connection[:products]
  
  config.relation(:products) do
    schema(infer: true)
  end
end

Monitoring and Observability

Monitoring multiple databases requires comprehensive instrumentation. ActiveSupport::Notifications provides hooks for database operations. Database-specific monitoring gems include scout_apm, newrelic_rpm, and datadog_tracing.

# Database query instrumentation
ActiveSupport::Notifications.subscribe('sql.active_record') do |*args|
  event = ActiveSupport::Notifications::Event.new(*args)
  
  StatsD.timing('database.query.duration', event.duration)
  StatsD.increment('database.query.count')
end

# Custom instrumentation for NoSQL
class MongoInstrumented
  def initialize(client)
    @client = client
  end
  
  def find(collection, query)
    start = Time.current
    result = @client[collection].find(query)
    duration = (Time.current - start) * 1000
    
    StatsD.timing('mongodb.query.duration', duration)
    result
  end
end

Real-World Applications

E-commerce Platform

Large e-commerce platforms demonstrate polyglot persistence across multiple data domains. Product catalogs use MongoDB for flexible attribute storage. Order processing uses PostgreSQL for transactional integrity. Session management uses Redis for fast access. Product recommendations use Neo4j for relationship queries. Search functionality uses Elasticsearch for full-text capabilities.

Product catalogs benefit from document storage because products have varying attributes across categories. Electronics have technical specifications while clothing has sizes and colors. MongoDB's flexible schema accommodates this variance without sparse columns or entity-attribute-value patterns.

class EcommercePlatform
  def display_product_page(product_id)
    # MongoDB for product details
    product = MONGO_CLIENT[:products].find(_id: product_id).first
    
    # PostgreSQL for inventory
    inventory = POSTGRES_CONN.exec_params(
      "SELECT warehouse_id, quantity FROM inventory WHERE product_id = $1",
      [product_id]
    )
    
    # Redis for pricing (frequently updated)
    price = REDIS.get("price:#{product_id}")
    
    # Neo4j for recommendations
    recommendations = NEO4J_SESSION.query(
      "MATCH (p:Product {id: $id})-[:SIMILAR_TO]->(r:Product) 
       RETURN r LIMIT 10",
      id: product_id
    )
    
    # Elasticsearch for reviews
    reviews = SEARCH_CLIENT.search(
      index: 'reviews',
      body: { query: { term: { product_id: product_id } } }
    )
    
    {
      product: product,
      inventory: inventory,
      price: price,
      recommendations: recommendations,
      reviews: reviews
    }
  end
end

Order processing requires ACID transactions that PostgreSQL provides. Orders must update inventory atomically, apply discounts correctly, and maintain referential integrity across order lines.

Session data experiences high read/write volume with short lifespans. Redis provides microsecond latency for session lookups without persisting unnecessary historical data. Session expiration happens automatically through Redis TTL.

Social Media Analytics

Social media platforms handle diverse data types across billions of records. User profiles and posts use Cassandra for horizontal scalability. Real-time feeds use Redis for fast access to recent content. Social graphs use Neo4j for relationship traversal. Analytics use ClickHouse for fast aggregation queries. Full-text search uses Elasticsearch.

User profiles require availability over consistency. Cassandra's eventual consistency and partition tolerance enable global distribution. Profile updates propagate across regions asynchronously while reads remain fast locally.

class SocialMediaPlatform
  def generate_user_feed(user_id, limit: 50)
    # Check Redis cache for recent feed
    cached_feed = REDIS.get("feed:#{user_id}")
    return JSON.parse(cached_feed) if cached_feed
    
    # Neo4j for following relationships
    following_ids = NEO4J_SESSION.query(
      "MATCH (u:User {id: $id})-[:FOLLOWS]->(f:User) RETURN f.id",
      id: user_id
    ).map { |r| r['f.id'] }
    
    # Cassandra for posts from followed users
    posts = CASSANDRA_SESSION.execute(
      "SELECT * FROM posts WHERE user_id IN ? ORDER BY created_at DESC LIMIT ?",
      arguments: [following_ids, limit]
    )
    
    # Cache generated feed
    REDIS.setex("feed:#{user_id}", 300, posts.to_json)
    
    posts
  end
  
  def generate_trending_report
    # ClickHouse for analytics queries
    CLICKHOUSE_CLIENT.query(
      "SELECT hashtag, count(*) as mentions
       FROM post_hashtags
       WHERE created_at >= now() - INTERVAL 1 HOUR
       GROUP BY hashtag
       ORDER BY mentions DESC
       LIMIT 100"
    )
  end
end

Social graphs with billions of connections require specialized graph databases. Computing mutual friends, shortest paths, or recommendation networks needs efficient graph traversal that relational databases handle poorly at scale.

Analytics queries aggregate across billions of events. ClickHouse's columnar storage and parallel processing handle these queries in seconds while traditional databases timeout.

Financial Trading System

Financial systems combine strict consistency requirements with high-performance caching. Trade execution uses PostgreSQL with strict ACID transactions. Market data caching uses Redis for microsecond-latency quotes. Historical analysis uses TimescaleDB for time-series data. Regulatory reporting uses PostgreSQL for complex queries.

Trade execution cannot tolerate inconsistency. Atomic balance updates, position tracking, and transaction logs require full ACID compliance. PostgreSQL's serializable isolation level prevents race conditions in concurrent trading.

class TradingSystem
  def execute_trade(user_id, symbol, quantity, price)
    POSTGRES_CONN.transaction do
      # Check and update balance
      balance = POSTGRES_CONN.exec_params(
        "SELECT balance FROM accounts WHERE user_id = $1 FOR UPDATE",
        [user_id]
      ).first
      
      total_cost = quantity * price
      raise InsufficientFunds if balance['balance'] < total_cost
      
      POSTGRES_CONN.exec_params(
        "UPDATE accounts SET balance = balance - $1 WHERE user_id = $2",
        [total_cost, user_id]
      )
      
      # Record trade
      trade = POSTGRES_CONN.exec_params(
        "INSERT INTO trades (user_id, symbol, quantity, price, executed_at)
         VALUES ($1, $2, $3, $4, $5) RETURNING id",
        [user_id, symbol, quantity, price, Time.current]
      )
      
      # Update position
      POSTGRES_CONN.exec_params(
        "INSERT INTO positions (user_id, symbol, quantity)
         VALUES ($1, $2, $3)
         ON CONFLICT (user_id, symbol)
         DO UPDATE SET quantity = positions.quantity + $3",
        [user_id, symbol, quantity]
      )
      
      # Invalidate cache
      REDIS.del("positions:#{user_id}")
      
      trade.first
    end
  end
  
  def get_market_data(symbol)
    # Redis for real-time quotes
    quote = REDIS.get("quote:#{symbol}")
    return JSON.parse(quote) if quote
    
    # Fall back to market data service
    quote = fetch_from_market_data_service(symbol)
    REDIS.setex("quote:#{symbol}", 1, quote.to_json)
    quote
  end
end

Market data requires microsecond latency that only in-memory caches provide. Redis stores quotes with one-second expiration, serving millions of reads per second while staying current.

Historical analysis queries time-series data across years. TimescaleDB extends PostgreSQL with time-series optimizations, enabling fast queries over billions of data points while maintaining SQL compatibility for complex analytics.

Reference

Database Types and Use Cases

Database Type	Primary Use Case	Data Model	Examples
Relational	Transactional data with complex relationships	Rows and columns with schemas	PostgreSQL, MySQL
Document	Semi-structured data with varying schemas	JSON-like documents	MongoDB, CouchDB
Key-Value	Simple lookups and caching	Keys mapped to values	Redis, Memcached
Column-Family	Analytics and large-scale data	Column-oriented storage	Cassandra, HBase
Graph	Highly connected data with relationships	Nodes and edges	Neo4j, ArangoDB
Time-Series	Time-stamped data and metrics	Time-ordered records	TimescaleDB, InfluxDB
Search Engine	Full-text search and analysis	Inverted indexes	Elasticsearch, Solr

Ruby Database Gems

Gem	Database	Purpose	Connection Example
pg	PostgreSQL	Relational database	Sequel.connect('postgres://localhost/db')
mysql2	MySQL	Relational database	Mysql2::Client.new(host: 'localhost')
mongoid	MongoDB	Document store	Mongoid.load!('mongoid.yml')
redis	Redis	Key-value cache	Redis.new(url: 'redis://localhost:6379')
neo4j	Neo4j	Graph database	Neo4j::Driver.new('bolt://localhost:7687')
elasticsearch	Elasticsearch	Search engine	Elasticsearch::Client.new(url: 'http://localhost:9200')
cassandra-driver	Cassandra	Column-family store	Cassandra.cluster(hosts: ['127.0.0.1'])
dalli	Memcached	Key-value cache	Dalli::Client.new('localhost:11211')

Consistency Patterns

Pattern	Consistency Level	Performance	Complexity	Best For
Strong Consistency	Immediate	Lower	High	Financial transactions
Eventual Consistency	Delayed	Higher	Medium	User profiles, catalogs
Read-Your-Writes	Session-level	Medium	Medium	User sessions
Causal Consistency	Causally ordered	Medium	High	Collaborative editing
Cache-Aside	Eventually consistent	High	Low	Frequently read data

Data Synchronization Strategies

Strategy	Latency	Ordering Guarantees	Failure Handling	Implementation
Event Streaming	Low	Ordered within partition	Retry and replay	Kafka, RabbitMQ
Change Data Capture	Low	Sequential	Automatic retry	Debezium, Maxwell
API Replication	Medium	No ordering	Manual retry	REST APIs, GraphQL
Batch Sync	High	Bulk ordering	Transaction rollback	Scheduled jobs
Database Triggers	Immediate	Transactional	Rollback	Stored procedures

Connection Pool Configuration

Setting	Purpose	Typical Value	Consideration
Size	Maximum connections	20-100	Based on concurrency
Timeout	Wait time for connection	5 seconds	Balance speed and patience
Checkout Timeout	Time to acquire connection	5 seconds	Prevent indefinite waiting
Reaping Frequency	Dead connection cleanup	60 seconds	Remove stale connections
Idle Timeout	Close unused connections	300 seconds	Free resources

Common Implementation Patterns

Pattern	Purpose	Trade-off	Ruby Implementation
Repository	Encapsulate data access	Abstraction overhead	Class per domain entity
Unit of Work	Group operations	Transaction complexity	ActiveRecord transactions
Data Mapper	Separate persistence	Mapping overhead	ROM, Sequel
Active Record	Combine model and persistence	Tight coupling	Rails ActiveRecord
CQRS	Separate reads and writes	System complexity	Multiple models per entity
Event Sourcing	Store all changes as events	Storage overhead	rails_event_store

Performance Characteristics

Operation	Relational	Document	Key-Value	Graph	Time-Series
Point Query	1-10ms	1-10ms	<1ms	1-10ms	1-5ms
Range Scan	10-100ms	10-100ms	Not supported	10-100ms	1-10ms
Aggregation	100-1000ms	10-100ms	Not supported	100-1000ms	10-100ms
Join	100-1000ms	Application-level	Not supported	1-10ms	Application-level
Full-Text Search	100-1000ms	Not optimized	Not supported	Not optimized	Not supported

Operational Considerations

Aspect	Single Database	Polyglot Persistence
Backup Complexity	Single backup process	Coordinated backups across systems
Monitoring	Single dashboard	Multiple monitoring tools
Disaster Recovery	Single restore process	Synchronized restore procedures
Security	Single authentication system	Multiple security configurations
Upgrades	Single upgrade window	Coordinated upgrades
Team Skills	One technology expertise	Multiple technology expertise
Operational Cost	Lower	Higher

Polyglot Persistence