CrackedRuby - Key-Value Stores

Overview

Key-value stores represent the simplest form of NoSQL database, storing data as a collection of key-value pairs where each unique key maps to exactly one value. The data model consists of an opaque key serving as the unique identifier and a value containing the actual data. Unlike relational databases with schemas and structured queries, key-value stores operate with direct key-based lookup operations.

The architecture provides O(1) average-case complexity for basic operations: get, put, and delete. This performance characteristic makes key-value stores the foundation for caching layers, session management, and high-throughput data access patterns. The storage model trades query flexibility for speed and horizontal scalability.

Key-value stores appear in two primary forms: in-memory stores where data resides in RAM for maximum speed but loses persistence, and persistent stores that write data to disk while maintaining fast access through memory caching. Distributed key-value stores add replication and partitioning across multiple nodes for fault tolerance and increased capacity.

# In-memory key-value store using Ruby Hash
cache = {}
cache["user:1000"] = { name: "Alice", email: "alice@example.com" }
cache["user:1000"]
# => {:name=>"Alice", :email=>"alice@example.com"}

# External key-value store (Redis example)
require 'redis'
redis = Redis.new
redis.set("session:abc123", "user_id=1000")
redis.get("session:abc123")
# => "user_id=1000"

Key Principles

Key Space and Uniqueness: Every key in a key-value store must be unique within the namespace. The key serves as the primary mechanism for data retrieval. Keys typically consist of strings, though some implementations support binary keys. Applications often encode structure into keys using conventions like namespacing (prefix:identifier) to organize related data and prevent collisions.

Value Opacity: The store treats values as opaque blobs without understanding their internal structure. The database performs no parsing, indexing, or querying of value contents. This design decision enables the store to handle any data type—strings, binary data, serialized objects, JSON documents—without schema constraints. The application bears responsibility for serialization and deserialization.

Atomic Operations: Individual key operations execute atomically. Setting a value, retrieving a value, or deleting a key completes entirely or not at all. This atomicity guarantee prevents partial updates and eliminates common concurrency issues at the single-key level. Multi-key operations typically lack transactional guarantees unless explicitly provided by the implementation.

Time-to-Live (TTL): Many key-value stores support automatic expiration through TTL values. Setting a TTL causes the store to delete the key-value pair after the specified duration. This mechanism handles temporary data like sessions, cache entries, and rate limiting counters without manual cleanup code.

Memory-First Architecture: In-memory key-value stores prioritize speed by keeping all data in RAM. Operations bypass disk I/O entirely, achieving microsecond latencies. Persistent stores use memory as a cache layer, serving frequently accessed data from RAM while maintaining durability through background disk writes or write-ahead logs.

Distribution and Partitioning: Distributed implementations partition data across multiple nodes using consistent hashing or similar algorithms. The partitioning strategy determines which node stores each key-value pair. Replication copies data to multiple nodes for fault tolerance. The replication factor controls the number of copies maintained across the cluster.

Eventual Consistency: Distributed key-value stores often favor availability and partition tolerance over strong consistency, following the CAP theorem. Updates propagate asynchronously to replica nodes. During this propagation window, different nodes may return different values for the same key. Applications requiring strong consistency must explicitly configure synchronous replication at the cost of increased latency.

Ruby Implementation

Ruby provides the Hash class as a native key-value store implementation. Hash operates in memory with O(1) average-case performance for insertions, lookups, and deletions. The implementation uses open addressing with Robin Hood hashing for collision resolution.

# Basic Hash operations
store = {}

# Setting values
store[:user_count] = 1000
store["cache_key"] = "cached_value"
store[123] = "numeric_key"

# Retrieving values
store[:user_count]  # => 1000
store.fetch("cache_key")  # => "cached_value"
store.fetch(:missing_key, "default")  # => "default"

# Deleting keys
store.delete(:user_count)  # => 1000
store[:user_count]  # => nil

Hash supports any object as a key, provided the object implements the hash and eql? methods correctly. Symbols and strings serve as the most common key types, though frozen strings offer better memory efficiency for string keys in Ruby 2.3+.

# Key types and behavior
store = {}

# Symbol keys (most efficient)
store[:session_id] = "abc123"

# String keys
store["user:1000"] = {name: "Alice"}

# Custom objects as keys
class UserId
  attr_reader :id
  
  def initialize(id)
    @id = id
  end
  
  def hash
    @id.hash
  end
  
  def eql?(other)
    other.is_a?(UserId) && @id == other.id
  end
end

user_id = UserId.new(1000)
store[user_id] = "user_data"
store[UserId.new(1000)]  # => "user_data"

The Hash class provides methods that align with key-value store operations. The store method atomically sets a key if it doesn't exist, returning the existing value if present. The fetch method retrieves values with default handling or block-based computation for missing keys.

# Advanced Hash operations
store = {}

# Atomic set-if-absent
store.store(:counter, 0) unless store.key?(:counter)

# Fetch with default computation
result = store.fetch(:expensive_calculation) do
  # Only executed if key is missing
  perform_expensive_operation
end

# Bulk operations
store.merge!(page_1: "data1", page_2: "data2")

# Atomic increment pattern
store[:counter] = 0
store[:counter] += 1

# Conditional update
current = store[:version]
store[:version] = current + 1 if current

External key-value stores require client libraries. The redis gem provides a client for Redis, while the dalli gem connects to Memcached. These clients serialize Ruby objects automatically, though explicit serialization gives more control.

# Redis client usage
require 'redis'

redis = Redis.new(host: 'localhost', port: 6379)

# String values
redis.set("user:1000:name", "Alice")
redis.get("user:1000:name")  # => "Alice"

# TTL expiration
redis.setex("session:xyz", 3600, "session_data")
redis.ttl("session:xyz")  # => remaining seconds

# Atomic increment
redis.incr("page_views")
redis.incrby("score", 10)

# Multiple keys
redis.mset("key1", "val1", "key2", "val2")
redis.mget("key1", "key2")  # => ["val1", "val2"]

# Serialized objects
require 'json'
user = {id: 1000, name: "Alice"}
redis.set("user:1000", JSON.generate(user))
JSON.parse(redis.get("user:1000"))  # => {"id"=>1000, "name"=>"Alice"}

The connection_pool gem manages connection pools for external stores, reducing connection overhead in multi-threaded applications. Connection pooling prevents exhausting server connections when handling concurrent requests.

require 'redis'
require 'connection_pool'

# Create connection pool
redis_pool = ConnectionPool.new(size: 5, timeout: 3) do
  Redis.new(host: 'localhost', port: 6379)
end

# Use connection from pool
redis_pool.with do |redis|
  redis.set("key", "value")
  redis.get("key")
end

# Pool automatically returns connection after block

Ruby on Rails integrates key-value stores through ActiveSupport::Cache. The cache interface abstracts the underlying store, supporting memory, file system, Memcached, and Redis backends with a consistent API.

# Rails cache configuration (config/environments/production.rb)
config.cache_store = :redis_cache_store, {
  url: 'redis://localhost:6379/0',
  expires_in: 1.hour,
  namespace: 'myapp'
}

# Using the cache
Rails.cache.write("user:1000", user_object, expires_in: 30.minutes)
Rails.cache.read("user:1000")

# Fetch with automatic population
Rails.cache.fetch("expensive_query", expires_in: 5.minutes) do
  User.where(active: true).count
end

Implementation Approaches

In-Memory Stores: The simplest implementation keeps all data in process memory using native data structures. This approach delivers the fastest possible performance with sub-microsecond latencies. Data volatility remains the primary constraint—process restarts erase all stored data. Applications use in-memory stores for truly temporary data like request-scoped caches or computation intermediates.

Ruby's Hash class serves this purpose without external dependencies. For distributed in-memory caching, Memcached provides a shared cache across multiple application servers. Memcached uses a fixed memory allocation, evicting least recently used entries when capacity fills.

# In-memory store with LRU eviction
class LRUCache
  def initialize(capacity)
    @capacity = capacity
    @cache = {}
    @order = []
  end
  
  def get(key)
    return nil unless @cache.key?(key)
    @order.delete(key)
    @order.push(key)
    @cache[key]
  end
  
  def set(key, value)
    if @cache.key?(key)
      @order.delete(key)
    elsif @cache.size >= @capacity
      oldest = @order.shift
      @cache.delete(oldest)
    end
    @cache[key] = value
    @order.push(key)
  end
end

Persistent Stores: Disk-backed stores maintain durability by writing data to stable storage. The implementation typically uses memory as a cache layer while persisting data asynchronously or through write-ahead logs. Redis exemplifies this approach with configurable persistence strategies: RDB snapshots at intervals or AOF logging of every write operation.

Persistent stores survive process restarts, making them appropriate for session data, user preferences, and any state that outlives individual requests. The durability guarantee comes with write amplification costs as data moves from memory to disk.

Embedded Databases: Embedded key-value databases like LevelDB, RocksDB, or LMDB link directly into the application process without separate server processes. These stores provide persistent storage with optimized read and write patterns. The embedded approach eliminates network overhead but limits data sharing between processes.

# LevelDB usage through leveldb-ruby gem
require 'leveldb'

db = LevelDB::DB.new('/path/to/database')

# Basic operations
db.put('key', 'value')
db.get('key')  # => 'value'
db.delete('key')

# Batch writes
db.batch do |batch|
  batch.put('key1', 'value1')
  batch.put('key2', 'value2')
  batch.delete('old_key')
end

db.close

Distributed Stores: Multi-node key-value stores partition data across servers for horizontal scalability and fault tolerance. Consistent hashing distributes keys evenly across nodes while minimizing data movement when nodes join or leave. Replication copies each key-value pair to multiple nodes, typically three replicas in production configurations.

Redis Cluster, Riak, and Amazon DynamoDB represent distributed implementations with different consistency and availability trade-offs. Applications must account for network partitions, node failures, and replication lag when designing against distributed stores.

Write Strategies: Stores implement different write durability levels. Write-through immediately persists data to disk before acknowledging the write, guaranteeing durability at the cost of latency. Write-back queues writes in memory, acknowledging immediately for speed but risking data loss on crashes. Write-behind asynchronously persists writes in the background, balancing performance and durability.

Partitioning Schemes: Range partitioning divides keys by value ranges, storing consecutive keys on the same node. This approach enables efficient range queries but risks uneven load distribution. Hash partitioning applies a hash function to each key, distributing load evenly but losing key ordering. Consistent hashing maps both keys and nodes to a hash ring, adding or removing nodes with minimal data redistribution.

Performance Considerations

Key-value stores optimize for single-key operations at the expense of complex queries. The O(1) lookup complexity derives from hash table implementations that compute key positions directly rather than scanning. This characteristic makes key-value stores the fastest database category for point queries by primary key.

Memory Efficiency: In-memory stores consume RAM proportional to data size. The memory footprint includes both keys and values plus overhead for hash table structures. Long keys waste memory—applications should use compact key formats like IDs rather than descriptive strings. Value compression reduces memory usage when storing text or JSON, though compression adds CPU overhead.

# Memory-efficient key design
# Inefficient: descriptive keys
store["user_profile_information_1000"] = data

# Efficient: compact keys
store["u:1000"] = data

# Very large value compression
require 'zlib'

def compress_set(redis, key, value)
  compressed = Zlib::Deflate.deflate(value)
  redis.set(key, compressed)
end

def compress_get(redis, key)
  compressed = redis.get(key)
  Zlib::Inflate.inflate(compressed) if compressed
end

Network Latency: External key-value stores incur network round trips. A single get or set operation typically completes in 1-5 milliseconds over a local network. This latency dominates operation cost for remote stores. Pipelining batches multiple operations into a single round trip, dramatically improving throughput when processing multiple keys.

# Without pipelining (N network round trips)
keys = (1..1000).map { |i| "key#{i}" }
values = keys.map { |key| redis.get(key) }

# With pipelining (1 network round trip)
values = redis.pipelined do |pipe|
  keys.each { |key| pipe.get(key) }
end

Write Amplification: Persistent stores write each update to disk, creating I/O overhead. Write-ahead logging can generate multiple disk writes per logical operation—one for the log entry and another for the data file. Batch writes amortize this cost by grouping multiple operations into a single disk flush.

Read Amplification: Stores using LSM trees (Log-Structured Merge trees) like RocksDB may read multiple disk locations to serve a single key lookup. The structure maintains data across multiple levels, requiring searches through each level until finding the key. Bloom filters reduce read amplification by quickly determining whether a key exists in each level.

Hot Keys: Uneven access patterns concentrate load on specific keys. A cache storing user sessions with one extremely active user creates a hot key. Distributed stores cannot partition hot keys across nodes since each key maps to a single partition. Solutions include replicating hot key values to multiple nodes or sharding the value across multiple keys.

# Hot key mitigation through replication
def get_with_replication(key, replicas: 3)
  replica = rand(replicas)
  redis.get("#{key}:replica#{replica}")
end

def set_with_replication(key, value, replicas: 3)
  replicas.times do |i|
    redis.set("#{key}:replica#{i}", value)
  end
end

Eviction Policies: Memory-constrained stores evict entries when capacity fills. LRU (Least Recently Used) removes the least recently accessed key. LFU (Least Frequently Used) tracks access frequency, removing infrequently used keys. TTL-based eviction removes expired keys before considering access patterns. The eviction policy significantly impacts cache hit rates.

Serialization Overhead: Storing complex objects requires serialization to byte sequences. JSON serialization adds 5-20% overhead for structured data. MessagePack reduces this to 2-10% with binary encoding. Marshal provides the fastest serialization for Ruby objects but lacks cross-language compatibility. Choosing the appropriate serializer balances speed, size, and interoperability requirements.

require 'json'
require 'msgpack'
require 'benchmark'

data = { id: 1000, name: "Alice", tags: ["ruby", "developer"] }

Benchmark.bmbm do |x|
  x.report("JSON") { 10000.times { JSON.parse(JSON.generate(data)) } }
  x.report("MessagePack") { 10000.times { MessagePack.unpack(MessagePack.pack(data)) } }
  x.report("Marshal") { 10000.times { Marshal.load(Marshal.dump(data)) } }
end

Tools & Ecosystem

Redis: The most widely deployed key-value store balances in-memory performance with optional persistence. Redis supports multiple data structures beyond simple key-value: lists, sets, sorted sets, hashes, and streams. The redis-rb gem provides the standard Ruby client. Redis Cluster extends Redis to distributed deployments with automatic partitioning and failover.

require 'redis'

redis = Redis.new(url: 'redis://localhost:6379/0')

# Atomic counter
redis.incr('page_views')

# Set operations
redis.sadd('tags', 'ruby', 'rails', 'redis')
redis.smembers('tags')  # => ["ruby", "rails", "redis"]

# Sorted set for leaderboards
redis.zadd('scores', 100, 'player1')
redis.zadd('scores', 200, 'player2')
redis.zrevrange('scores', 0, 9)  # Top 10 players

Memcached: A pure in-memory cache without persistence focuses exclusively on caching use cases. The dalli gem provides thread-safe Memcached access. Memcached excels at distributing cached data across multiple servers with consistent hashing built into the client.

require 'dalli'

cache = Dalli::Client.new('localhost:11211')

# Basic operations
cache.set('key', 'value', 3600)  # 1 hour TTL
cache.get('key')

# Multi-get reduces round trips
cache.get_multi('key1', 'key2', 'key3')
# => {"key1"=>"value1", "key2"=>"value2"}

# CAS (Compare-And-Swap) for race-free updates
cache.cas('counter') do |value|
  value.to_i + 1
end

LevelDB/RocksDB: Embedded databases optimized for write-heavy workloads use LSM tree structures. RocksDB, a fork of LevelDB, adds features like column families and better compaction. These stores suit applications needing persistent key-value storage without operating separate servers.

Amazon DynamoDB: A fully managed distributed key-value and document store provides predictable performance at scale. The aws-sdk-dynamodb gem offers Ruby integration. DynamoDB bills by request volume and storage, eliminating server management.

etcd: A distributed key-value store built for configuration management and service discovery stores small amounts of critical data with strong consistency. etcd uses the Raft consensus algorithm to maintain agreement across nodes. The etcdv3 gem connects Ruby applications to etcd clusters.

Valkey: A fork of Redis maintaining compatibility while adding enhancements represents the community-driven evolution. Valkey preserves the Redis API while introducing improvements to clustering, persistence, and memory management.

Common Patterns

Cache-Aside: The application checks the cache before querying the database. Cache misses trigger database queries with results stored in the cache for subsequent requests. This pattern gives applications full control over what data to cache and when to invalidate entries.

def get_user(user_id)
  cache_key = "user:#{user_id}"
  
  # Check cache first
  cached = Rails.cache.read(cache_key)
  return cached if cached
  
  # Cache miss: query database
  user = User.find(user_id)
  
  # Store in cache
  Rails.cache.write(cache_key, user, expires_in: 1.hour)
  
  user
end

def update_user(user_id, attributes)
  user = User.find(user_id)
  user.update(attributes)
  
  # Invalidate cache after update
  Rails.cache.delete("user:#{user_id}")
  
  user
end

Read-Through/Write-Through: The cache sits between the application and database, intercepting all data access. Read-through automatically loads missing data from the database into the cache. Write-through updates the cache and database synchronously, maintaining consistency. This pattern centralizes caching logic but requires cache infrastructure that implements database integration.

Write-Behind: Updates go to the cache immediately, acknowledging writes quickly. The cache asynchronously flushes changes to the database in the background. This pattern maximizes write performance but risks data loss if the cache fails before flushing. Applications use write-behind for high-throughput scenarios where eventual consistency suffices.

Session Storage: Web applications store user session data in key-value stores for scalability across multiple servers. The session ID from the cookie serves as the key, mapping to serialized session data. TTL expiration automatically removes abandoned sessions.

# Rails session store configuration
Rails.application.config.session_store :redis_store,
  servers: 'redis://localhost:6379/0/session',
  expire_after: 2.hours,
  key: '_app_session',
  threadsafe: true

# Accessing session data
session[:user_id] = current_user.id
session[:cart_items] = [@item1.id, @item2.id]

user_id = session[:user_id]

Distributed Locking: Applications use key-value stores to coordinate exclusive access to shared resources across multiple processes. The SETNX (SET if Not eXists) operation atomically claims a lock if no lock exists. Adding TTL prevents locks from persisting after crashes.

class DistributedLock
  def initialize(redis, key, ttl: 10)
    @redis = redis
    @key = "lock:#{key}"
    @ttl = ttl
    @token = SecureRandom.uuid
  end
  
  def acquire
    # Atomically set lock with expiration
    @redis.set(@key, @token, nx: true, ex: @ttl)
  end
  
  def release
    # Only release if we still own the lock
    lua = <<~LUA
      if redis.call("get", KEYS[1]) == ARGV[1] then
        return redis.call("del", KEYS[1])
      else
        return 0
      end
    LUA
    
    @redis.eval(lua, keys: [@key], argv: [@token])
  end
end

lock = DistributedLock.new(redis, 'resource_id')
if lock.acquire
  begin
    # Critical section
    perform_exclusive_operation
  ensure
    lock.release
  end
end

Rate Limiting: Counting requests per key with TTL implements rate limiting. Incrementing a counter for each request and checking against thresholds controls access rates. The atomic increment operation prevents race conditions in concurrent environments.

class RateLimiter
  def initialize(redis, max_requests:, window:)
    @redis = redis
    @max_requests = max_requests
    @window = window
  end
  
  def allow?(identifier)
    key = "rate:#{identifier}"
    
    count = @redis.get(key).to_i
    
    if count >= @max_requests
      false
    else
      @redis.multi do |multi|
        multi.incr(key)
        multi.expire(key, @window) if count == 0
      end
      true
    end
  end
end

limiter = RateLimiter.new(redis, max_requests: 100, window: 3600)

if limiter.allow?(user_id)
  process_request
else
  render_rate_limit_error
end

Bloom Filter Caching: Storing a bloom filter of existing keys in the cache prevents unnecessary cache queries for non-existent keys. The filter quickly determines if a key definitely does not exist, reducing load on the cache and database.

Computed Results: Expensive computations store results in the cache with the input parameters encoded in the key. Subsequent requests with identical parameters retrieve cached results instead of recomputing. This pattern trades memory for CPU time.

def fibonacci(n)
  cache_key = "fib:#{n}"
  
  Rails.cache.fetch(cache_key, expires_in: 1.hour) do
    n <= 1 ? n : fibonacci(n-1) + fibonacci(n-2)
  end
end

Reference

Core Operations

Operation	Description	Complexity
get	Retrieve value by key	O(1) average
set	Store key-value pair	O(1) average
delete	Remove key-value pair	O(1) average
exists	Check key existence	O(1) average
keys	List all keys	O(n)
scan	Iterate keys incrementally	O(1) per call

Ruby Hash Methods

Method	Purpose	Example
[]	Get value by key	hash[:key]
[]=	Set value for key	hash[:key] = value
fetch	Get with default or block	hash.fetch(:key, default)
key?	Check if key exists	hash.key?(:name)
delete	Remove key	hash.delete(:key)
merge	Combine hashes	hash.merge(other)
clear	Remove all entries	hash.clear

Redis Common Commands

Command	Purpose	Ruby API
SET	Store string value	redis.set(key, value)
GET	Retrieve string value	redis.get(key)
SETEX	Set with expiration	redis.setex(key, ttl, value)
INCR	Increment counter	redis.incr(key)
DEL	Delete key	redis.del(key)
EXISTS	Check existence	redis.exists(key)
EXPIRE	Set TTL	redis.expire(key, seconds)
TTL	Get remaining TTL	redis.ttl(key)
MGET	Get multiple keys	redis.mget(key1, key2)
MSET	Set multiple keys	redis.mset(key1, val1, key2, val2)

Performance Characteristics

Store Type	Read Latency	Write Latency	Persistence	Distribution
Ruby Hash	Sub-microsecond	Sub-microsecond	None	Single process
Redis	1-5ms	1-5ms	Optional	Cluster mode
Memcached	1-3ms	1-3ms	None	Client-side sharding
LevelDB	50-200μs	10-50μs	Yes	Embedded
DynamoDB	5-10ms	5-10ms	Yes	Fully managed

Serialization Format Comparison

Format	Speed	Size	Cross-Language	Human Readable
Marshal	Fastest	Medium	No	No
JSON	Medium	Large	Yes	Yes
MessagePack	Fast	Small	Yes	No
Protocol Buffers	Fast	Smallest	Yes	No

Cache Eviction Policies

Policy	Eviction Criteria	Use Case
LRU	Least recently accessed	General purpose caching
LFU	Least frequently accessed	Long-term popularity
FIFO	First in, first out	Simple queue-like cache
Random	Random selection	Minimal overhead
TTL	Time-based expiration	Time-sensitive data

Key Design Patterns

Pattern	Format	Example
Namespace prefix	type:id	user:1000
Hierarchical	entity🆔property	post:500:author
Composite key	entity:id1:id2	friendship💯200
Versioned key	key:version	config:v2
Timestamped key	key:timestamp	log:1634567890

Key-Value Stores