Overview
Key-value stores represent the simplest form of NoSQL database, storing data as a collection of key-value pairs where each unique key maps to exactly one value. The data model consists of an opaque key serving as the unique identifier and a value containing the actual data. Unlike relational databases with schemas and structured queries, key-value stores operate with direct key-based lookup operations.
The architecture provides O(1) average-case complexity for basic operations: get, put, and delete. This performance characteristic makes key-value stores the foundation for caching layers, session management, and high-throughput data access patterns. The storage model trades query flexibility for speed and horizontal scalability.
Key-value stores appear in two primary forms: in-memory stores where data resides in RAM for maximum speed but loses persistence, and persistent stores that write data to disk while maintaining fast access through memory caching. Distributed key-value stores add replication and partitioning across multiple nodes for fault tolerance and increased capacity.
# In-memory key-value store using Ruby Hash
cache = {}
cache["user:1000"] = { name: "Alice", email: "alice@example.com" }
cache["user:1000"]
# => {:name=>"Alice", :email=>"alice@example.com"}
# External key-value store (Redis example)
require 'redis'
redis = Redis.new
redis.set("session:abc123", "user_id=1000")
redis.get("session:abc123")
# => "user_id=1000"
Key Principles
Key Space and Uniqueness: Every key in a key-value store must be unique within the namespace. The key serves as the primary mechanism for data retrieval. Keys typically consist of strings, though some implementations support binary keys. Applications often encode structure into keys using conventions like namespacing (prefix:identifier) to organize related data and prevent collisions.
Value Opacity: The store treats values as opaque blobs without understanding their internal structure. The database performs no parsing, indexing, or querying of value contents. This design decision enables the store to handle any data type—strings, binary data, serialized objects, JSON documents—without schema constraints. The application bears responsibility for serialization and deserialization.
Atomic Operations: Individual key operations execute atomically. Setting a value, retrieving a value, or deleting a key completes entirely or not at all. This atomicity guarantee prevents partial updates and eliminates common concurrency issues at the single-key level. Multi-key operations typically lack transactional guarantees unless explicitly provided by the implementation.
Time-to-Live (TTL): Many key-value stores support automatic expiration through TTL values. Setting a TTL causes the store to delete the key-value pair after the specified duration. This mechanism handles temporary data like sessions, cache entries, and rate limiting counters without manual cleanup code.
Memory-First Architecture: In-memory key-value stores prioritize speed by keeping all data in RAM. Operations bypass disk I/O entirely, achieving microsecond latencies. Persistent stores use memory as a cache layer, serving frequently accessed data from RAM while maintaining durability through background disk writes or write-ahead logs.
Distribution and Partitioning: Distributed implementations partition data across multiple nodes using consistent hashing or similar algorithms. The partitioning strategy determines which node stores each key-value pair. Replication copies data to multiple nodes for fault tolerance. The replication factor controls the number of copies maintained across the cluster.
Eventual Consistency: Distributed key-value stores often favor availability and partition tolerance over strong consistency, following the CAP theorem. Updates propagate asynchronously to replica nodes. During this propagation window, different nodes may return different values for the same key. Applications requiring strong consistency must explicitly configure synchronous replication at the cost of increased latency.
Ruby Implementation
Ruby provides the Hash class as a native key-value store implementation. Hash operates in memory with O(1) average-case performance for insertions, lookups, and deletions. The implementation uses open addressing with Robin Hood hashing for collision resolution.
# Basic Hash operations
store = {}
# Setting values
store[:user_count] = 1000
store["cache_key"] = "cached_value"
store[123] = "numeric_key"
# Retrieving values
store[:user_count] # => 1000
store.fetch("cache_key") # => "cached_value"
store.fetch(:missing_key, "default") # => "default"
# Deleting keys
store.delete(:user_count) # => 1000
store[:user_count] # => nil
Hash supports any object as a key, provided the object implements the hash and eql? methods correctly. Symbols and strings serve as the most common key types, though frozen strings offer better memory efficiency for string keys in Ruby 2.3+.
# Key types and behavior
store = {}
# Symbol keys (most efficient)
store[:session_id] = "abc123"
# String keys
store["user:1000"] = {name: "Alice"}
# Custom objects as keys
class UserId
attr_reader :id
def initialize(id)
@id = id
end
def hash
@id.hash
end
def eql?(other)
other.is_a?(UserId) && @id == other.id
end
end
user_id = UserId.new(1000)
store[user_id] = "user_data"
store[UserId.new(1000)] # => "user_data"
The Hash class provides methods that align with key-value store operations. The store method atomically sets a key if it doesn't exist, returning the existing value if present. The fetch method retrieves values with default handling or block-based computation for missing keys.
# Advanced Hash operations
store = {}
# Atomic set-if-absent
store.store(:counter, 0) unless store.key?(:counter)
# Fetch with default computation
result = store.fetch(:expensive_calculation) do
# Only executed if key is missing
perform_expensive_operation
end
# Bulk operations
store.merge!(page_1: "data1", page_2: "data2")
# Atomic increment pattern
store[:counter] = 0
store[:counter] += 1
# Conditional update
current = store[:version]
store[:version] = current + 1 if current
External key-value stores require client libraries. The redis gem provides a client for Redis, while the dalli gem connects to Memcached. These clients serialize Ruby objects automatically, though explicit serialization gives more control.
# Redis client usage
require 'redis'
redis = Redis.new(host: 'localhost', port: 6379)
# String values
redis.set("user:1000:name", "Alice")
redis.get("user:1000:name") # => "Alice"
# TTL expiration
redis.setex("session:xyz", 3600, "session_data")
redis.ttl("session:xyz") # => remaining seconds
# Atomic increment
redis.incr("page_views")
redis.incrby("score", 10)
# Multiple keys
redis.mset("key1", "val1", "key2", "val2")
redis.mget("key1", "key2") # => ["val1", "val2"]
# Serialized objects
require 'json'
user = {id: 1000, name: "Alice"}
redis.set("user:1000", JSON.generate(user))
JSON.parse(redis.get("user:1000")) # => {"id"=>1000, "name"=>"Alice"}
The connection_pool gem manages connection pools for external stores, reducing connection overhead in multi-threaded applications. Connection pooling prevents exhausting server connections when handling concurrent requests.
require 'redis'
require 'connection_pool'
# Create connection pool
redis_pool = ConnectionPool.new(size: 5, timeout: 3) do
Redis.new(host: 'localhost', port: 6379)
end
# Use connection from pool
redis_pool.with do |redis|
redis.set("key", "value")
redis.get("key")
end
# Pool automatically returns connection after block
Ruby on Rails integrates key-value stores through ActiveSupport::Cache. The cache interface abstracts the underlying store, supporting memory, file system, Memcached, and Redis backends with a consistent API.
# Rails cache configuration (config/environments/production.rb)
config.cache_store = :redis_cache_store, {
url: 'redis://localhost:6379/0',
expires_in: 1.hour,
namespace: 'myapp'
}
# Using the cache
Rails.cache.write("user:1000", user_object, expires_in: 30.minutes)
Rails.cache.read("user:1000")
# Fetch with automatic population
Rails.cache.fetch("expensive_query", expires_in: 5.minutes) do
User.where(active: true).count
end
Implementation Approaches
In-Memory Stores: The simplest implementation keeps all data in process memory using native data structures. This approach delivers the fastest possible performance with sub-microsecond latencies. Data volatility remains the primary constraint—process restarts erase all stored data. Applications use in-memory stores for truly temporary data like request-scoped caches or computation intermediates.
Ruby's Hash class serves this purpose without external dependencies. For distributed in-memory caching, Memcached provides a shared cache across multiple application servers. Memcached uses a fixed memory allocation, evicting least recently used entries when capacity fills.
# In-memory store with LRU eviction
class LRUCache
def initialize(capacity)
@capacity = capacity
@cache = {}
@order = []
end
def get(key)
return nil unless @cache.key?(key)
@order.delete(key)
@order.push(key)
@cache[key]
end
def set(key, value)
if @cache.key?(key)
@order.delete(key)
elsif @cache.size >= @capacity
oldest = @order.shift
@cache.delete(oldest)
end
@cache[key] = value
@order.push(key)
end
end
Persistent Stores: Disk-backed stores maintain durability by writing data to stable storage. The implementation typically uses memory as a cache layer while persisting data asynchronously or through write-ahead logs. Redis exemplifies this approach with configurable persistence strategies: RDB snapshots at intervals or AOF logging of every write operation.
Persistent stores survive process restarts, making them appropriate for session data, user preferences, and any state that outlives individual requests. The durability guarantee comes with write amplification costs as data moves from memory to disk.
Embedded Databases: Embedded key-value databases like LevelDB, RocksDB, or LMDB link directly into the application process without separate server processes. These stores provide persistent storage with optimized read and write patterns. The embedded approach eliminates network overhead but limits data sharing between processes.
# LevelDB usage through leveldb-ruby gem
require 'leveldb'
db = LevelDB::DB.new('/path/to/database')
# Basic operations
db.put('key', 'value')
db.get('key') # => 'value'
db.delete('key')
# Batch writes
db.batch do |batch|
batch.put('key1', 'value1')
batch.put('key2', 'value2')
batch.delete('old_key')
end
db.close
Distributed Stores: Multi-node key-value stores partition data across servers for horizontal scalability and fault tolerance. Consistent hashing distributes keys evenly across nodes while minimizing data movement when nodes join or leave. Replication copies each key-value pair to multiple nodes, typically three replicas in production configurations.
Redis Cluster, Riak, and Amazon DynamoDB represent distributed implementations with different consistency and availability trade-offs. Applications must account for network partitions, node failures, and replication lag when designing against distributed stores.
Write Strategies: Stores implement different write durability levels. Write-through immediately persists data to disk before acknowledging the write, guaranteeing durability at the cost of latency. Write-back queues writes in memory, acknowledging immediately for speed but risking data loss on crashes. Write-behind asynchronously persists writes in the background, balancing performance and durability.
Partitioning Schemes: Range partitioning divides keys by value ranges, storing consecutive keys on the same node. This approach enables efficient range queries but risks uneven load distribution. Hash partitioning applies a hash function to each key, distributing load evenly but losing key ordering. Consistent hashing maps both keys and nodes to a hash ring, adding or removing nodes with minimal data redistribution.
Performance Considerations
Key-value stores optimize for single-key operations at the expense of complex queries. The O(1) lookup complexity derives from hash table implementations that compute key positions directly rather than scanning. This characteristic makes key-value stores the fastest database category for point queries by primary key.
Memory Efficiency: In-memory stores consume RAM proportional to data size. The memory footprint includes both keys and values plus overhead for hash table structures. Long keys waste memory—applications should use compact key formats like IDs rather than descriptive strings. Value compression reduces memory usage when storing text or JSON, though compression adds CPU overhead.
# Memory-efficient key design
# Inefficient: descriptive keys
store["user_profile_information_1000"] = data
# Efficient: compact keys
store["u:1000"] = data
# Very large value compression
require 'zlib'
def compress_set(redis, key, value)
compressed = Zlib::Deflate.deflate(value)
redis.set(key, compressed)
end
def compress_get(redis, key)
compressed = redis.get(key)
Zlib::Inflate.inflate(compressed) if compressed
end
Network Latency: External key-value stores incur network round trips. A single get or set operation typically completes in 1-5 milliseconds over a local network. This latency dominates operation cost for remote stores. Pipelining batches multiple operations into a single round trip, dramatically improving throughput when processing multiple keys.
# Without pipelining (N network round trips)
keys = (1..1000).map { |i| "key#{i}" }
values = keys.map { |key| redis.get(key) }
# With pipelining (1 network round trip)
values = redis.pipelined do |pipe|
keys.each { |key| pipe.get(key) }
end
Write Amplification: Persistent stores write each update to disk, creating I/O overhead. Write-ahead logging can generate multiple disk writes per logical operation—one for the log entry and another for the data file. Batch writes amortize this cost by grouping multiple operations into a single disk flush.
Read Amplification: Stores using LSM trees (Log-Structured Merge trees) like RocksDB may read multiple disk locations to serve a single key lookup. The structure maintains data across multiple levels, requiring searches through each level until finding the key. Bloom filters reduce read amplification by quickly determining whether a key exists in each level.
Hot Keys: Uneven access patterns concentrate load on specific keys. A cache storing user sessions with one extremely active user creates a hot key. Distributed stores cannot partition hot keys across nodes since each key maps to a single partition. Solutions include replicating hot key values to multiple nodes or sharding the value across multiple keys.
# Hot key mitigation through replication
def get_with_replication(key, replicas: 3)
replica = rand(replicas)
redis.get("#{key}:replica#{replica}")
end
def set_with_replication(key, value, replicas: 3)
replicas.times do |i|
redis.set("#{key}:replica#{i}", value)
end
end
Eviction Policies: Memory-constrained stores evict entries when capacity fills. LRU (Least Recently Used) removes the least recently accessed key. LFU (Least Frequently Used) tracks access frequency, removing infrequently used keys. TTL-based eviction removes expired keys before considering access patterns. The eviction policy significantly impacts cache hit rates.
Serialization Overhead: Storing complex objects requires serialization to byte sequences. JSON serialization adds 5-20% overhead for structured data. MessagePack reduces this to 2-10% with binary encoding. Marshal provides the fastest serialization for Ruby objects but lacks cross-language compatibility. Choosing the appropriate serializer balances speed, size, and interoperability requirements.
require 'json'
require 'msgpack'
require 'benchmark'
data = { id: 1000, name: "Alice", tags: ["ruby", "developer"] }
Benchmark.bmbm do |x|
x.report("JSON") { 10000.times { JSON.parse(JSON.generate(data)) } }
x.report("MessagePack") { 10000.times { MessagePack.unpack(MessagePack.pack(data)) } }
x.report("Marshal") { 10000.times { Marshal.load(Marshal.dump(data)) } }
end
Tools & Ecosystem
Redis: The most widely deployed key-value store balances in-memory performance with optional persistence. Redis supports multiple data structures beyond simple key-value: lists, sets, sorted sets, hashes, and streams. The redis-rb gem provides the standard Ruby client. Redis Cluster extends Redis to distributed deployments with automatic partitioning and failover.
require 'redis'
redis = Redis.new(url: 'redis://localhost:6379/0')
# Atomic counter
redis.incr('page_views')
# Set operations
redis.sadd('tags', 'ruby', 'rails', 'redis')
redis.smembers('tags') # => ["ruby", "rails", "redis"]
# Sorted set for leaderboards
redis.zadd('scores', 100, 'player1')
redis.zadd('scores', 200, 'player2')
redis.zrevrange('scores', 0, 9) # Top 10 players
Memcached: A pure in-memory cache without persistence focuses exclusively on caching use cases. The dalli gem provides thread-safe Memcached access. Memcached excels at distributing cached data across multiple servers with consistent hashing built into the client.
require 'dalli'
cache = Dalli::Client.new('localhost:11211')
# Basic operations
cache.set('key', 'value', 3600) # 1 hour TTL
cache.get('key')
# Multi-get reduces round trips
cache.get_multi('key1', 'key2', 'key3')
# => {"key1"=>"value1", "key2"=>"value2"}
# CAS (Compare-And-Swap) for race-free updates
cache.cas('counter') do |value|
value.to_i + 1
end
LevelDB/RocksDB: Embedded databases optimized for write-heavy workloads use LSM tree structures. RocksDB, a fork of LevelDB, adds features like column families and better compaction. These stores suit applications needing persistent key-value storage without operating separate servers.
Amazon DynamoDB: A fully managed distributed key-value and document store provides predictable performance at scale. The aws-sdk-dynamodb gem offers Ruby integration. DynamoDB bills by request volume and storage, eliminating server management.
etcd: A distributed key-value store built for configuration management and service discovery stores small amounts of critical data with strong consistency. etcd uses the Raft consensus algorithm to maintain agreement across nodes. The etcdv3 gem connects Ruby applications to etcd clusters.
Valkey: A fork of Redis maintaining compatibility while adding enhancements represents the community-driven evolution. Valkey preserves the Redis API while introducing improvements to clustering, persistence, and memory management.
Common Patterns
Cache-Aside: The application checks the cache before querying the database. Cache misses trigger database queries with results stored in the cache for subsequent requests. This pattern gives applications full control over what data to cache and when to invalidate entries.
def get_user(user_id)
cache_key = "user:#{user_id}"
# Check cache first
cached = Rails.cache.read(cache_key)
return cached if cached
# Cache miss: query database
user = User.find(user_id)
# Store in cache
Rails.cache.write(cache_key, user, expires_in: 1.hour)
user
end
def update_user(user_id, attributes)
user = User.find(user_id)
user.update(attributes)
# Invalidate cache after update
Rails.cache.delete("user:#{user_id}")
user
end
Read-Through/Write-Through: The cache sits between the application and database, intercepting all data access. Read-through automatically loads missing data from the database into the cache. Write-through updates the cache and database synchronously, maintaining consistency. This pattern centralizes caching logic but requires cache infrastructure that implements database integration.
Write-Behind: Updates go to the cache immediately, acknowledging writes quickly. The cache asynchronously flushes changes to the database in the background. This pattern maximizes write performance but risks data loss if the cache fails before flushing. Applications use write-behind for high-throughput scenarios where eventual consistency suffices.
Session Storage: Web applications store user session data in key-value stores for scalability across multiple servers. The session ID from the cookie serves as the key, mapping to serialized session data. TTL expiration automatically removes abandoned sessions.
# Rails session store configuration
Rails.application.config.session_store :redis_store,
servers: 'redis://localhost:6379/0/session',
expire_after: 2.hours,
key: '_app_session',
threadsafe: true
# Accessing session data
session[:user_id] = current_user.id
session[:cart_items] = [@item1.id, @item2.id]
user_id = session[:user_id]
Distributed Locking: Applications use key-value stores to coordinate exclusive access to shared resources across multiple processes. The SETNX (SET if Not eXists) operation atomically claims a lock if no lock exists. Adding TTL prevents locks from persisting after crashes.
class DistributedLock
def initialize(redis, key, ttl: 10)
@redis = redis
@key = "lock:#{key}"
@ttl = ttl
@token = SecureRandom.uuid
end
def acquire
# Atomically set lock with expiration
@redis.set(@key, @token, nx: true, ex: @ttl)
end
def release
# Only release if we still own the lock
lua = <<~LUA
if redis.call("get", KEYS[1]) == ARGV[1] then
return redis.call("del", KEYS[1])
else
return 0
end
LUA
@redis.eval(lua, keys: [@key], argv: [@token])
end
end
lock = DistributedLock.new(redis, 'resource_id')
if lock.acquire
begin
# Critical section
perform_exclusive_operation
ensure
lock.release
end
end
Rate Limiting: Counting requests per key with TTL implements rate limiting. Incrementing a counter for each request and checking against thresholds controls access rates. The atomic increment operation prevents race conditions in concurrent environments.
class RateLimiter
def initialize(redis, max_requests:, window:)
@redis = redis
@max_requests = max_requests
@window = window
end
def allow?(identifier)
key = "rate:#{identifier}"
count = @redis.get(key).to_i
if count >= @max_requests
false
else
@redis.multi do |multi|
multi.incr(key)
multi.expire(key, @window) if count == 0
end
true
end
end
end
limiter = RateLimiter.new(redis, max_requests: 100, window: 3600)
if limiter.allow?(user_id)
process_request
else
render_rate_limit_error
end
Bloom Filter Caching: Storing a bloom filter of existing keys in the cache prevents unnecessary cache queries for non-existent keys. The filter quickly determines if a key definitely does not exist, reducing load on the cache and database.
Computed Results: Expensive computations store results in the cache with the input parameters encoded in the key. Subsequent requests with identical parameters retrieve cached results instead of recomputing. This pattern trades memory for CPU time.
def fibonacci(n)
cache_key = "fib:#{n}"
Rails.cache.fetch(cache_key, expires_in: 1.hour) do
n <= 1 ? n : fibonacci(n-1) + fibonacci(n-2)
end
end
Reference
Core Operations
| Operation | Description | Complexity |
|---|---|---|
| get | Retrieve value by key | O(1) average |
| set | Store key-value pair | O(1) average |
| delete | Remove key-value pair | O(1) average |
| exists | Check key existence | O(1) average |
| keys | List all keys | O(n) |
| scan | Iterate keys incrementally | O(1) per call |
Ruby Hash Methods
| Method | Purpose | Example |
|---|---|---|
| [] | Get value by key | hash[:key] |
| []= | Set value for key | hash[:key] = value |
| fetch | Get with default or block | hash.fetch(:key, default) |
| key? | Check if key exists | hash.key?(:name) |
| delete | Remove key | hash.delete(:key) |
| merge | Combine hashes | hash.merge(other) |
| clear | Remove all entries | hash.clear |
Redis Common Commands
| Command | Purpose | Ruby API |
|---|---|---|
| SET | Store string value | redis.set(key, value) |
| GET | Retrieve string value | redis.get(key) |
| SETEX | Set with expiration | redis.setex(key, ttl, value) |
| INCR | Increment counter | redis.incr(key) |
| DEL | Delete key | redis.del(key) |
| EXISTS | Check existence | redis.exists(key) |
| EXPIRE | Set TTL | redis.expire(key, seconds) |
| TTL | Get remaining TTL | redis.ttl(key) |
| MGET | Get multiple keys | redis.mget(key1, key2) |
| MSET | Set multiple keys | redis.mset(key1, val1, key2, val2) |
Performance Characteristics
| Store Type | Read Latency | Write Latency | Persistence | Distribution |
|---|---|---|---|---|
| Ruby Hash | Sub-microsecond | Sub-microsecond | None | Single process |
| Redis | 1-5ms | 1-5ms | Optional | Cluster mode |
| Memcached | 1-3ms | 1-3ms | None | Client-side sharding |
| LevelDB | 50-200μs | 10-50μs | Yes | Embedded |
| DynamoDB | 5-10ms | 5-10ms | Yes | Fully managed |
Serialization Format Comparison
| Format | Speed | Size | Cross-Language | Human Readable |
|---|---|---|---|---|
| Marshal | Fastest | Medium | No | No |
| JSON | Medium | Large | Yes | Yes |
| MessagePack | Fast | Small | Yes | No |
| Protocol Buffers | Fast | Smallest | Yes | No |
Cache Eviction Policies
| Policy | Eviction Criteria | Use Case |
|---|---|---|
| LRU | Least recently accessed | General purpose caching |
| LFU | Least frequently accessed | Long-term popularity |
| FIFO | First in, first out | Simple queue-like cache |
| Random | Random selection | Minimal overhead |
| TTL | Time-based expiration | Time-sensitive data |
Key Design Patterns
| Pattern | Format | Example |
|---|---|---|
| Namespace prefix | type:id | user:1000 |
| Hierarchical | entity🆔property | post:500:author |
| Composite key | entity:id1:id2 | friendship💯200 |
| Versioned key | key:version | config:v2 |
| Timestamped key | key:timestamp | log:1634567890 |