Overview
Polyglot persistence describes an architectural pattern where applications use multiple specialized database technologies rather than a single general-purpose database. Each data store handles specific types of data and access patterns according to its strengths. A typical implementation might combine a relational database for transactional data, a document store for flexible schemas, a key-value store for caching, and a graph database for relationship-heavy data.
The term emerged from the polyglot programming movement, which advocates using multiple programming languages within a system. Martin Fowler and Pramod Sadalage popularized polyglot persistence in the early 2010s as NoSQL databases matured and became production-ready. The pattern addresses limitations of the "one size fits all" database approach that dominated previous decades.
Traditional applications typically standardize on a single database technology and force all data into that model. This creates impedance mismatches where the database's capabilities don't align with specific data requirements. A relational database handles hierarchical document structures awkwardly. A document database performs poorly for highly connected graph data. A traditional SQL database lacks the performance characteristics needed for high-volume caching.
Polyglot persistence solves these mismatches by selecting appropriate storage technologies for each data type. An e-commerce application might store product catalogs in a document database for flexible attributes, order transactions in a relational database for ACID guarantees, session data in a memory cache for fast access, and product recommendations in a graph database for relationship queries. Each technology operates at peak efficiency for its designated purpose.
The pattern requires careful boundary definition between data stores. Applications must determine which data belongs in which system and how to maintain consistency across boundaries. This architectural complexity trades off against the performance gains and modeling improvements from specialized databases.
# Example: E-commerce application with multiple data stores
class Product
include Mongoid::Document # Document database for flexible schema
field :name, type: String
field :attributes, type: Hash
end
class Order
# ActiveRecord for relational integrity
belongs_to :user
has_many :line_items
end
class RecommendationGraph
# Neo4j for relationship queries
include Neo4j::ActiveNode
has_many :out, :recommends, model_class: :Product
end
class SessionCache
# Redis for fast key-value access
def self.get(session_id)
REDIS.get("session:#{session_id}")
end
end
Key Principles
Bounded Contexts and Data Ownership
Each database owns specific data domains with clear boundaries. The concept derives from Domain-Driven Design where bounded contexts define areas of responsibility. A product catalog context might own product data in MongoDB while an order processing context owns transactional data in PostgreSQL. These boundaries prevent overlapping ownership and clarify which system serves as the source of truth for each data type.
Data ownership extends beyond storage to include all operations on that data. The owning context provides the canonical interface for accessing and modifying its data. Other contexts access this data through well-defined APIs rather than direct database queries. This encapsulation allows the owning context to change its internal storage technology without affecting consumers.
Data Model Alignment
Each database type optimizes for specific data models and access patterns. Relational databases excel at structured data with strong relationships and complex queries. Document databases handle semi-structured data with varying schemas. Key-value stores provide fast simple lookups. Graph databases traverse complex relationships efficiently. Column-family stores optimize for analytical queries across large datasets.
Selecting storage technology starts with analyzing the data's natural structure and primary access patterns. User profile data with flexible attributes fits document storage. Shopping cart data requiring atomic updates needs transactional storage. Product view counts requiring high-write throughput suit column-family storage. Social network connections demanding relationship traversal need graph storage.
Consistency Models
Polyglot persistence often requires accepting eventual consistency between data stores. Strong consistency across distributed systems requires coordination that limits performance and availability. Many polyglot architectures adopt eventual consistency where updates propagate asynchronously between systems and temporary inconsistencies are acceptable.
Event-driven architectures enable eventual consistency by publishing changes as events that other systems consume. When an order completes in the transactional database, an event triggers updates to the search index, cache invalidation, and analytics store. Each system processes events at its own pace, reaching consistency eventually rather than immediately.
Some data requires strong consistency despite using multiple stores. Critical operations like financial transactions might use distributed transaction protocols or saga patterns to maintain consistency. The decision between strong and eventual consistency depends on business requirements rather than technical convenience.
Technology Selection Criteria
Choosing databases involves evaluating multiple factors: data model fit, query patterns, consistency requirements, scalability needs, operational complexity, and team expertise. The data model should match the database's native structure without significant transformation. Query patterns should align with the database's optimization strengths. Consistency requirements must fit within the database's guarantees.
Operational considerations often outweigh technical features. A technically superior database that the team cannot operate effectively provides less value than a familiar technology with adequate capabilities. Infrastructure compatibility, monitoring tools, backup procedures, and disaster recovery processes affect the total cost of ownership.
Inter-Store Communication
Systems using polyglot persistence need mechanisms for sharing data between stores. Direct database queries create tight coupling and violate bounded context principles. API-based communication through services provides better encapsulation but adds latency and complexity. Event streaming enables loose coupling and temporal decoupling but requires eventual consistency acceptance.
# Event-driven synchronization between stores
class OrderService
def complete_order(order_id)
order = Order.find(order_id)
order.update!(status: 'completed')
# Publish event for other systems
EventBus.publish('order.completed', {
order_id: order.id,
user_id: order.user_id,
total: order.total,
items: order.line_items.map(&:product_id)
})
end
end
class SearchIndexUpdater
def handle_order_completed(event)
# Update Elasticsearch with purchase data
product_ids = event[:items]
products = Product.where(id: product_ids)
products.each do |product|
SearchIndex.update_popularity(product.id)
end
end
end
Design Considerations
Complexity Cost Analysis
Polyglot persistence adds significant operational and developmental complexity. Each additional database technology requires installation, configuration, monitoring, backup, and maintenance procedures. Development teams need expertise across multiple database systems. Troubleshooting issues spans multiple technologies. Schema migrations affect multiple systems. Disaster recovery procedures grow more complex.
This complexity cost must offset with concrete benefits. Applications gain value from polyglot persistence when specific data access patterns perform significantly better with specialized databases, when data models fit poorly into a single database type, or when scalability requirements exceed what a single database can provide. Small applications with simple data requirements rarely benefit from the added complexity.
The break-even point typically occurs in medium to large applications with distinct data domains. An application with clear separation between transactional data, search indexes, caching layers, and analytics stores benefits from specialized databases for each domain. Applications with homogeneous data models and simple access patterns should avoid unnecessary complexity.
Operational Overhead
Running multiple database technologies multiplies operational responsibilities. Each database needs monitoring for performance metrics, capacity planning, backup verification, security patching, and upgrade procedures. Database administrators need expertise across technologies or organizations need specialists for each system. On-call rotation must cover multiple database types.
Cloud-managed database services reduce operational overhead by handling infrastructure management, backups, patches, and basic monitoring. However, managed services introduce dependencies on cloud providers, limit configuration options, and add per-service costs. Organizations must balance operational simplicity against control and cost.
Infrastructure as code and automation reduce operational burden by codifying database configurations, backup procedures, and deployment processes. Treating infrastructure as code enables consistent deployments, simplifies disaster recovery, and reduces manual errors. Investment in automation pays dividends as the number of databases grows.
Data Consistency Trade-offs
Distributed data stores introduce consistency challenges absent in single-database systems. Strong consistency across databases requires distributed transactions or two-phase commit protocols that limit availability and performance. Most polyglot architectures accept eventual consistency for better availability and performance.
Eventual consistency requires applications to handle temporarily inconsistent data. A user might see old cached data before updates propagate. Search results might lag behind database changes. Analytics dashboards might show stale metrics. These inconsistencies are acceptable in many contexts but problematic for critical operations.
Critical data paths need strong consistency mechanisms. Financial transactions cannot accept eventual consistency for balance updates. Inventory management must prevent overselling through strong consistency. User authentication requires consistent credential verification. These requirements constrain where polyglot persistence applies or require additional coordination mechanisms.
Query Capabilities
Distributing data across databases limits query capabilities. Cross-database joins become impossible without extracting data from multiple sources and joining in application code. Transactions cannot span databases without distributed transaction protocols. Foreign key constraints don't enforce referential integrity across stores.
Applications must design data models to minimize cross-database queries. Denormalization duplicates data to avoid joins. Materialized views precompute aggregations. Caching stores frequently accessed combinations. These techniques trade storage space and update complexity for query performance.
Some queries remain impractical across database boundaries. Complex analytical queries spanning multiple data stores might require a separate data warehouse that consolidates data periodically. Real-time dashboards might need specialized systems that consume change streams from multiple sources.
Migration and Evolution
Systems using polyglot persistence face complex migration challenges. Database schema changes must coordinate across technologies. Adding new databases requires data migration from existing stores. Changing database technology involves migrating data while maintaining service availability.
Backward compatibility becomes critical during migrations. Applications must support both old and new database schemas during transition periods. Dual-write strategies maintain data in multiple stores temporarily. Feature flags enable gradual rollout of database changes.
Implementation Approaches
Microservices with Database per Service
The microservices pattern assigns each service its own database, implementing strong service boundaries through data isolation. Each service selects database technology matching its requirements without affecting other services. An inventory service might use PostgreSQL, a product catalog service MongoDB, a recommendations service Neo4j, and a session service Redis.
Service boundaries prevent direct database access from other services. All data access occurs through service APIs. This encapsulation allows services to change internal storage without external impact. Services own their data models and enforce business rules consistently.
Event-driven communication coordinates data changes across services. When a service modifies its data, it publishes events that other services consume. A completed order in the order service triggers inventory updates, shipping notifications, and analytics updates through events. This eventual consistency model scales well but requires careful event design.
Data duplication occurs naturally in this pattern. Services maintain local copies of data they frequently access. A shipping service caches product dimensions from the product catalog. A recommendation service stores user preferences from the user service. This denormalization improves performance at the cost of consistency complexity.
# Microservices with separate databases
class InventoryService
def initialize
@db = Sequel.connect('postgres://inventory-db')
end
def reserve_stock(product_id, quantity)
@db.transaction do
stock = @db[:stock].where(product_id: product_id).first
raise OutOfStock if stock[:available] < quantity
@db[:stock].where(product_id: product_id)
.update(available: stock[:available] - quantity)
publish_event('stock.reserved', {
product_id: product_id,
quantity: quantity
})
end
end
end
class ProductCatalogService
def initialize
@client = Mongo::Client.new(['mongodb://catalog-db:27017'])
end
def get_product(product_id)
@client[:products].find(_id: product_id).first
end
def update_product(product_id, attributes)
@client[:products].update_one(
{ _id: product_id },
{ '$set' => attributes }
)
publish_event('product.updated', {
product_id: product_id,
changes: attributes
})
end
end
Single Application with Multiple Data Stores
Monolithic applications can implement polyglot persistence by using multiple databases within a single codebase. The application code manages connections to different databases and routes operations to appropriate stores. This approach provides polyglot benefits without microservices complexity.
Domain modules or packages encapsulate database-specific logic. A product module interacts with MongoDB. An order module uses PostgreSQL. A cache module wraps Redis. Module boundaries enforce separation even within a single application. This organization prevents database logic from spreading throughout the codebase.
Connection management becomes critical in this pattern. Connection pools must be sized appropriately for each database. Timeout configurations need tuning. Health checks monitor each database's availability. The application must handle partial failures where some databases remain available while others fail.
Transaction management grows complex when operations span databases. Two-phase commit provides strong consistency but limits scalability. Saga patterns implement distributed transactions through coordinated local transactions with compensating actions. Applications must decide which operations require cross-database consistency and implement appropriate mechanisms.
Cache-Aside with Primary Database
The cache-aside pattern augments a primary database with a fast cache layer, often using Redis or Memcached. Applications read from cache first, falling back to the database on cache misses. Writes go to the database with explicit cache invalidation. This hybrid approach provides polyglot persistence benefits with minimal complexity.
Cache invalidation strategies determine consistency characteristics. Write-through caching updates cache and database simultaneously, maintaining consistency at the cost of write performance. Write-behind caching updates cache immediately and asynchronously updates the database, improving write performance but risking data loss. Explicit invalidation removes cache entries on writes, accepting temporary staleness for simplicity.
Cache warming strategies preload frequently accessed data. Applications can proactively populate cache during startup or low-traffic periods. Background jobs refresh cache entries before expiration. Warming reduces cache miss rates and improves performance consistency.
class UserRepository
def initialize
@db = ActiveRecord::Base.connection
@cache = Redis.new
end
def find_user(user_id)
cache_key = "user:#{user_id}"
# Try cache first
cached = @cache.get(cache_key)
return JSON.parse(cached) if cached
# Fall back to database
user = User.find(user_id)
@cache.setex(cache_key, 3600, user.to_json)
user
end
def update_user(user_id, attributes)
user = User.find(user_id)
user.update!(attributes)
# Invalidate cache
@cache.del("user:#{user_id}")
user
end
end
Event Sourcing with Multiple Projections
Event sourcing stores all state changes as events in an append-only log. Multiple database projections rebuild state from events, each optimized for specific query patterns. An event store might use PostgreSQL or specialized event databases. Projections might use Elasticsearch for search, PostgreSQL for relational queries, and Redis for caching.
Events provide the single source of truth. Projections represent derived state that can be rebuilt from events. This separation allows adding new projections without changing existing systems. A new analytics requirement can consume historical events to build its database without affecting current operations.
Event replay enables projection rebuilding. Corrupted projections can be deleted and reconstructed from events. Schema changes in projections don't require data migration; the projection rebuilds with the new schema. This flexibility simplifies database evolution.
Projection lag introduces eventual consistency. Projections process events asynchronously and lag behind the event store. Applications must handle queries against possibly stale projections. Some systems provide consistency checks or query the event store directly for strong consistency requirements.
Ruby Implementation
Connection Management
Ruby applications manage multiple database connections through configuration and connection pooling. Rails applications define connections in database.yml for multiple databases. Standalone applications configure connections directly using database-specific clients.
# Rails database.yml with multiple databases
# config/database.yml
production:
primary:
adapter: postgresql
database: app_production
pool: 25
catalog:
adapter: mongodb
database: products
hosts:
- mongodb1.example.com:27017
cache:
adapter: redis
url: redis://cache.example.com:6379/0
search:
adapter: elasticsearch
hosts:
- search1.example.com:9200
Rails 6+ provides built-in support for multiple databases through the connects_to directive. Models declare which database they use. Connection switching happens automatically based on model class.
class ApplicationRecord < ActiveRecord::Base
self.abstract_class = true
connects_to database: { writing: :primary, reading: :primary }
end
class Product < ApplicationRecord
connects_to database: { writing: :catalog, reading: :catalog }
end
# Explicit connection switching
ActiveRecord::Base.connected_to(role: :reading) do
# Database reads use read replica
User.find(123)
end
Connection pooling prevents connection exhaustion. Each database needs appropriate pool sizing based on expected concurrency. Rails uses thread-safe connection pools that automatically check out connections per thread.
# Custom connection pool configuration
class CacheConnection
def initialize
@redis_pool = ConnectionPool.new(size: 50, timeout: 5) do
Redis.new(url: ENV['REDIS_URL'])
end
end
def with_connection
@redis_pool.with do |conn|
yield conn
end
end
end
class CacheRepository
def self.get(key)
CACHE_POOL.with_connection do |redis|
redis.get(key)
end
end
end
Repository Pattern
The repository pattern encapsulates database-specific logic behind a common interface. Repositories handle connections, queries, and data transformation. This abstraction allows changing database implementations without affecting business logic.
class UserRepository
def find(id)
# PostgreSQL for user data
User.find(id)
end
def find_with_preferences(id)
# Join data from multiple sources
user = User.find(id)
preferences = PreferenceCache.get("user:#{id}:prefs")
user.preferences = preferences
user
end
def search(query)
# Elasticsearch for full-text search
SEARCH_CLIENT.search(
index: 'users',
body: { query: { match: { name: query } } }
)
end
end
class ProductRepository
def find(id)
# MongoDB for product catalog
Product.find(id)
end
def find_with_recommendations(id)
product = Product.find(id)
# Neo4j for graph-based recommendations
recommendations = NEO4J_SESSION.query(
"MATCH (p:Product {id: $id})-[:RECOMMENDS]->(r:Product) RETURN r",
id: id
)
product.recommendations = recommendations
product
end
end
Data Synchronization
Keeping data synchronized across databases requires event publishing and subscription mechanisms. Ruby applications commonly use message queues or event buses for asynchronous communication.
# Event publishing with Kafka
class OrderService
def complete_order(order)
ActiveRecord::Base.transaction do
order.update!(status: 'completed', completed_at: Time.current)
# Publish to Kafka
KAFKA_PRODUCER.produce(
JSON.generate({
event_type: 'order.completed',
order_id: order.id,
user_id: order.user_id,
total: order.total,
timestamp: Time.current.iso8601
}),
topic: 'orders'
)
end
KAFKA_PRODUCER.deliver_messages
end
end
# Event consumption and projection updates
class OrderCompletedHandler
def process(message)
event = JSON.parse(message.value)
# Update search index
update_search_index(event)
# Update cache
invalidate_user_cache(event['user_id'])
# Update analytics
record_analytics(event)
end
private
def update_search_index(event)
SEARCH_CLIENT.update(
index: 'orders',
id: event['order_id'],
body: { doc: { status: 'completed' } }
)
end
def invalidate_user_cache(user_id)
REDIS.del("user:#{user_id}:orders")
end
def record_analytics(event)
CLICKHOUSE_CLIENT.execute(
"INSERT INTO order_events VALUES (?, ?, ?, ?)",
event['order_id'],
event['user_id'],
event['total'],
event['timestamp']
)
end
end
Transaction Coordination
Operations spanning multiple databases need coordination mechanisms. Ruby applications implement saga patterns for distributed transactions with compensating actions.
class TransferInventorySaga
def execute(product_id, from_warehouse, to_warehouse, quantity)
# Step 1: Reserve stock at destination
destination_reservation = reserve_stock(to_warehouse, product_id, quantity)
# Step 2: Transfer from source
begin
transfer_from_source(from_warehouse, product_id, quantity)
rescue StandardError => e
# Compensate: release destination reservation
release_reservation(destination_reservation)
raise
end
# Step 3: Confirm at destination
begin
confirm_reservation(destination_reservation)
rescue StandardError => e
# Compensate: return stock to source
return_to_source(from_warehouse, product_id, quantity)
raise
end
# Step 4: Update search index
update_warehouse_index(product_id)
end
private
def reserve_stock(warehouse_id, product_id, quantity)
POSTGRES_CONN.exec_params(
"UPDATE warehouse_stock SET reserved = reserved + $1
WHERE warehouse_id = $2 AND product_id = $3
RETURNING id",
[quantity, warehouse_id, product_id]
)
end
def transfer_from_source(warehouse_id, product_id, quantity)
POSTGRES_CONN.exec_params(
"UPDATE warehouse_stock SET available = available - $1
WHERE warehouse_id = $2 AND product_id = $3
AND available >= $1",
[quantity, warehouse_id, product_id]
)
end
def update_warehouse_index(product_id)
SEARCH_CLIENT.update(
index: 'inventory',
id: product_id,
body: { doc: { last_updated: Time.current } }
)
end
end
Tools & Ecosystem
Database Adapters
Ruby provides adapters for connecting to various database technologies. ActiveRecord adapters integrate relational databases. Specialized gems handle NoSQL databases, caches, and search engines.
Relational database adapters include pg for PostgreSQL, mysql2 for MySQL, and sqlite3 for SQLite. These adapters integrate with ActiveRecord and provide raw connection interfaces. Non-Rails applications use Sequel as an alternative database toolkit supporting multiple relational databases.
Document database support comes through mongoid for MongoDB and cequel for Cassandra. These gems provide ActiveModel-compatible interfaces with database-specific features. Graph database access uses neo4j-ruby-driver for Neo4j connections.
Key-value store access typically uses redis-rb for Redis and dalli for Memcached. These lightweight clients handle caching operations efficiently. Search integration uses elasticsearch-ruby for Elasticsearch and searchkick for high-level search functionality.
Connection Pool Libraries
Connection pooling prevents resource exhaustion when managing multiple databases. The connection_pool gem provides thread-safe connection pools for any resource.
# Connection pooling for Redis
REDIS_POOL = ConnectionPool.new(size: 50, timeout: 5) do
Redis.new(url: ENV['REDIS_URL'])
end
# Connection pooling for MongoDB
MONGO_POOL = ConnectionPool.new(size: 25, timeout: 5) do
Mongo::Client.new(ENV['MONGODB_URL'])
end
# Using pooled connections
REDIS_POOL.with do |redis|
redis.get('key')
end
Event Processing Frameworks
Event-driven architectures require message queue integration. Popular Ruby options include ruby-kafka for Apache Kafka, bunny for RabbitMQ, and aws-sdk-sqs for Amazon SQS.
Wisper provides an in-process event bus for simpler event handling within applications. Event sourcing implementations use rails_event_store or eventide for managing event streams.
# Event processing with ruby-kafka
kafka = Kafka.new(['kafka://localhost:9092'])
producer = kafka.producer
# Publishing events
producer.produce({
event_type: 'user.created',
user_id: 123
}.to_json, topic: 'users')
producer.deliver_messages
# Consuming events
consumer = kafka.consumer(group_id: 'user-indexer')
consumer.subscribe('users')
consumer.each_message do |message|
event = JSON.parse(message.value)
update_search_index(event)
end
Database Abstraction Libraries
ROM (Ruby Object Mapper) provides database abstraction supporting multiple database types. ROM separates data mapping from persistence, enabling polyglot persistence with consistent patterns.
# ROM configuration for multiple databases
require 'rom'
ROM.container(:sql, 'postgres://localhost/users') do |config|
config.default.connection.create_table(:users) do
primary_key :id
column :name, String
end
config.relation(:users) do
schema(infer: true)
end
end
ROM.container(:mongodb, 'mongodb://localhost/products') do |config|
config.default.connection[:products]
config.relation(:products) do
schema(infer: true)
end
end
Monitoring and Observability
Monitoring multiple databases requires comprehensive instrumentation. ActiveSupport::Notifications provides hooks for database operations. Database-specific monitoring gems include scout_apm, newrelic_rpm, and datadog_tracing.
# Database query instrumentation
ActiveSupport::Notifications.subscribe('sql.active_record') do |*args|
event = ActiveSupport::Notifications::Event.new(*args)
StatsD.timing('database.query.duration', event.duration)
StatsD.increment('database.query.count')
end
# Custom instrumentation for NoSQL
class MongoInstrumented
def initialize(client)
@client = client
end
def find(collection, query)
start = Time.current
result = @client[collection].find(query)
duration = (Time.current - start) * 1000
StatsD.timing('mongodb.query.duration', duration)
result
end
end
Real-World Applications
E-commerce Platform
Large e-commerce platforms demonstrate polyglot persistence across multiple data domains. Product catalogs use MongoDB for flexible attribute storage. Order processing uses PostgreSQL for transactional integrity. Session management uses Redis for fast access. Product recommendations use Neo4j for relationship queries. Search functionality uses Elasticsearch for full-text capabilities.
Product catalogs benefit from document storage because products have varying attributes across categories. Electronics have technical specifications while clothing has sizes and colors. MongoDB's flexible schema accommodates this variance without sparse columns or entity-attribute-value patterns.
class EcommercePlatform
def display_product_page(product_id)
# MongoDB for product details
product = MONGO_CLIENT[:products].find(_id: product_id).first
# PostgreSQL for inventory
inventory = POSTGRES_CONN.exec_params(
"SELECT warehouse_id, quantity FROM inventory WHERE product_id = $1",
[product_id]
)
# Redis for pricing (frequently updated)
price = REDIS.get("price:#{product_id}")
# Neo4j for recommendations
recommendations = NEO4J_SESSION.query(
"MATCH (p:Product {id: $id})-[:SIMILAR_TO]->(r:Product)
RETURN r LIMIT 10",
id: product_id
)
# Elasticsearch for reviews
reviews = SEARCH_CLIENT.search(
index: 'reviews',
body: { query: { term: { product_id: product_id } } }
)
{
product: product,
inventory: inventory,
price: price,
recommendations: recommendations,
reviews: reviews
}
end
end
Order processing requires ACID transactions that PostgreSQL provides. Orders must update inventory atomically, apply discounts correctly, and maintain referential integrity across order lines.
Session data experiences high read/write volume with short lifespans. Redis provides microsecond latency for session lookups without persisting unnecessary historical data. Session expiration happens automatically through Redis TTL.
Social Media Analytics
Social media platforms handle diverse data types across billions of records. User profiles and posts use Cassandra for horizontal scalability. Real-time feeds use Redis for fast access to recent content. Social graphs use Neo4j for relationship traversal. Analytics use ClickHouse for fast aggregation queries. Full-text search uses Elasticsearch.
User profiles require availability over consistency. Cassandra's eventual consistency and partition tolerance enable global distribution. Profile updates propagate across regions asynchronously while reads remain fast locally.
class SocialMediaPlatform
def generate_user_feed(user_id, limit: 50)
# Check Redis cache for recent feed
cached_feed = REDIS.get("feed:#{user_id}")
return JSON.parse(cached_feed) if cached_feed
# Neo4j for following relationships
following_ids = NEO4J_SESSION.query(
"MATCH (u:User {id: $id})-[:FOLLOWS]->(f:User) RETURN f.id",
id: user_id
).map { |r| r['f.id'] }
# Cassandra for posts from followed users
posts = CASSANDRA_SESSION.execute(
"SELECT * FROM posts WHERE user_id IN ? ORDER BY created_at DESC LIMIT ?",
arguments: [following_ids, limit]
)
# Cache generated feed
REDIS.setex("feed:#{user_id}", 300, posts.to_json)
posts
end
def generate_trending_report
# ClickHouse for analytics queries
CLICKHOUSE_CLIENT.query(
"SELECT hashtag, count(*) as mentions
FROM post_hashtags
WHERE created_at >= now() - INTERVAL 1 HOUR
GROUP BY hashtag
ORDER BY mentions DESC
LIMIT 100"
)
end
end
Social graphs with billions of connections require specialized graph databases. Computing mutual friends, shortest paths, or recommendation networks needs efficient graph traversal that relational databases handle poorly at scale.
Analytics queries aggregate across billions of events. ClickHouse's columnar storage and parallel processing handle these queries in seconds while traditional databases timeout.
Financial Trading System
Financial systems combine strict consistency requirements with high-performance caching. Trade execution uses PostgreSQL with strict ACID transactions. Market data caching uses Redis for microsecond-latency quotes. Historical analysis uses TimescaleDB for time-series data. Regulatory reporting uses PostgreSQL for complex queries.
Trade execution cannot tolerate inconsistency. Atomic balance updates, position tracking, and transaction logs require full ACID compliance. PostgreSQL's serializable isolation level prevents race conditions in concurrent trading.
class TradingSystem
def execute_trade(user_id, symbol, quantity, price)
POSTGRES_CONN.transaction do
# Check and update balance
balance = POSTGRES_CONN.exec_params(
"SELECT balance FROM accounts WHERE user_id = $1 FOR UPDATE",
[user_id]
).first
total_cost = quantity * price
raise InsufficientFunds if balance['balance'] < total_cost
POSTGRES_CONN.exec_params(
"UPDATE accounts SET balance = balance - $1 WHERE user_id = $2",
[total_cost, user_id]
)
# Record trade
trade = POSTGRES_CONN.exec_params(
"INSERT INTO trades (user_id, symbol, quantity, price, executed_at)
VALUES ($1, $2, $3, $4, $5) RETURNING id",
[user_id, symbol, quantity, price, Time.current]
)
# Update position
POSTGRES_CONN.exec_params(
"INSERT INTO positions (user_id, symbol, quantity)
VALUES ($1, $2, $3)
ON CONFLICT (user_id, symbol)
DO UPDATE SET quantity = positions.quantity + $3",
[user_id, symbol, quantity]
)
# Invalidate cache
REDIS.del("positions:#{user_id}")
trade.first
end
end
def get_market_data(symbol)
# Redis for real-time quotes
quote = REDIS.get("quote:#{symbol}")
return JSON.parse(quote) if quote
# Fall back to market data service
quote = fetch_from_market_data_service(symbol)
REDIS.setex("quote:#{symbol}", 1, quote.to_json)
quote
end
end
Market data requires microsecond latency that only in-memory caches provide. Redis stores quotes with one-second expiration, serving millions of reads per second while staying current.
Historical analysis queries time-series data across years. TimescaleDB extends PostgreSQL with time-series optimizations, enabling fast queries over billions of data points while maintaining SQL compatibility for complex analytics.
Reference
Database Types and Use Cases
| Database Type | Primary Use Case | Data Model | Examples |
|---|---|---|---|
| Relational | Transactional data with complex relationships | Rows and columns with schemas | PostgreSQL, MySQL |
| Document | Semi-structured data with varying schemas | JSON-like documents | MongoDB, CouchDB |
| Key-Value | Simple lookups and caching | Keys mapped to values | Redis, Memcached |
| Column-Family | Analytics and large-scale data | Column-oriented storage | Cassandra, HBase |
| Graph | Highly connected data with relationships | Nodes and edges | Neo4j, ArangoDB |
| Time-Series | Time-stamped data and metrics | Time-ordered records | TimescaleDB, InfluxDB |
| Search Engine | Full-text search and analysis | Inverted indexes | Elasticsearch, Solr |
Ruby Database Gems
| Gem | Database | Purpose | Connection Example |
|---|---|---|---|
| pg | PostgreSQL | Relational database | Sequel.connect('postgres://localhost/db') |
| mysql2 | MySQL | Relational database | Mysql2::Client.new(host: 'localhost') |
| mongoid | MongoDB | Document store | Mongoid.load!('mongoid.yml') |
| redis | Redis | Key-value cache | Redis.new(url: 'redis://localhost:6379') |
| neo4j | Neo4j | Graph database | Neo4j::Driver.new('bolt://localhost:7687') |
| elasticsearch | Elasticsearch | Search engine | Elasticsearch::Client.new(url: 'http://localhost:9200') |
| cassandra-driver | Cassandra | Column-family store | Cassandra.cluster(hosts: ['127.0.0.1']) |
| dalli | Memcached | Key-value cache | Dalli::Client.new('localhost:11211') |
Consistency Patterns
| Pattern | Consistency Level | Performance | Complexity | Best For |
|---|---|---|---|---|
| Strong Consistency | Immediate | Lower | High | Financial transactions |
| Eventual Consistency | Delayed | Higher | Medium | User profiles, catalogs |
| Read-Your-Writes | Session-level | Medium | Medium | User sessions |
| Causal Consistency | Causally ordered | Medium | High | Collaborative editing |
| Cache-Aside | Eventually consistent | High | Low | Frequently read data |
Data Synchronization Strategies
| Strategy | Latency | Ordering Guarantees | Failure Handling | Implementation |
|---|---|---|---|---|
| Event Streaming | Low | Ordered within partition | Retry and replay | Kafka, RabbitMQ |
| Change Data Capture | Low | Sequential | Automatic retry | Debezium, Maxwell |
| API Replication | Medium | No ordering | Manual retry | REST APIs, GraphQL |
| Batch Sync | High | Bulk ordering | Transaction rollback | Scheduled jobs |
| Database Triggers | Immediate | Transactional | Rollback | Stored procedures |
Connection Pool Configuration
| Setting | Purpose | Typical Value | Consideration |
|---|---|---|---|
| Size | Maximum connections | 20-100 | Based on concurrency |
| Timeout | Wait time for connection | 5 seconds | Balance speed and patience |
| Checkout Timeout | Time to acquire connection | 5 seconds | Prevent indefinite waiting |
| Reaping Frequency | Dead connection cleanup | 60 seconds | Remove stale connections |
| Idle Timeout | Close unused connections | 300 seconds | Free resources |
Common Implementation Patterns
| Pattern | Purpose | Trade-off | Ruby Implementation |
|---|---|---|---|
| Repository | Encapsulate data access | Abstraction overhead | Class per domain entity |
| Unit of Work | Group operations | Transaction complexity | ActiveRecord transactions |
| Data Mapper | Separate persistence | Mapping overhead | ROM, Sequel |
| Active Record | Combine model and persistence | Tight coupling | Rails ActiveRecord |
| CQRS | Separate reads and writes | System complexity | Multiple models per entity |
| Event Sourcing | Store all changes as events | Storage overhead | rails_event_store |
Performance Characteristics
| Operation | Relational | Document | Key-Value | Graph | Time-Series |
|---|---|---|---|---|---|
| Point Query | 1-10ms | 1-10ms | <1ms | 1-10ms | 1-5ms |
| Range Scan | 10-100ms | 10-100ms | Not supported | 10-100ms | 1-10ms |
| Aggregation | 100-1000ms | 10-100ms | Not supported | 100-1000ms | 10-100ms |
| Join | 100-1000ms | Application-level | Not supported | 1-10ms | Application-level |
| Full-Text Search | 100-1000ms | Not optimized | Not supported | Not optimized | Not supported |
Operational Considerations
| Aspect | Single Database | Polyglot Persistence |
|---|---|---|
| Backup Complexity | Single backup process | Coordinated backups across systems |
| Monitoring | Single dashboard | Multiple monitoring tools |
| Disaster Recovery | Single restore process | Synchronized restore procedures |
| Security | Single authentication system | Multiple security configurations |
| Upgrades | Single upgrade window | Coordinated upgrades |
| Team Skills | One technology expertise | Multiple technology expertise |
| Operational Cost | Lower | Higher |