Overview
Column-family stores represent a NoSQL database model that organizes data into column families rather than traditional row-based tables. Unlike relational databases that store complete rows together, column-family stores group related columns into families and distribute data across nodes based on row keys. This architecture originated from Google's Bigtable paper and powers systems handling massive datasets across distributed clusters.
The fundamental distinction lies in how data is physically stored and accessed. A relational database stores an entire row contiguously on disk, requiring reads to fetch all columns even when only specific attributes are needed. Column-family stores organize data so that columns within a family are stored together, enabling efficient retrieval of specific attributes across millions of rows without scanning unnecessary data.
Consider a user profile system. A relational database stores each user as a complete row:
UserID | Name | Email | LastLogin | PreferenceA | PreferenceB | ...
1 | Alice| a@... | 2025-01-15| true | false | ...
A column-family store structures the same data differently:
RowKey: user:1
Identity:Name = "Alice"
Identity:Email = "a@example.com"
Activity:LastLogin = "2025-01-15"
Preferences:A = "true"
Preferences:B = "false"
Each column family (Identity, Activity, Preferences) can be stored and retrieved independently. Reading login activity does not require loading preference data, reducing I/O operations dramatically when dealing with wide rows containing hundreds of attributes.
Column-family stores excel in scenarios requiring high write throughput, flexible schemas, and the ability to scale horizontally across commodity hardware. Time-series data, user activity tracking, content management systems, and recommendation engines commonly use this model. The architecture trades ACID guarantees and complex joins for scalability and availability, following the principles outlined in the CAP theorem.
Key Principles
The column-family data model organizes information into a hierarchical structure of keyspaces, column families, rows, and columns. A keyspace functions as the top-level namespace, similar to a database in relational systems, containing configuration for replication and data distribution strategies. Within a keyspace, column families define logical groupings of related data with shared physical storage characteristics.
Each row is identified by a unique row key, functioning as the primary access path to data. Row keys determine data distribution across cluster nodes through partitioning functions. The system hashes row keys to assign data to specific nodes, ensuring balanced distribution and parallel processing capabilities. Rows within a column family need not contain the same columns, providing schema flexibility absent in relational models.
Columns consist of three components: a name, value, and timestamp. The timestamp enables versioning, allowing the database to maintain multiple versions of the same column and resolve conflicts in distributed environments. Column names themselves carry semantic meaning and often include dynamic elements:
# Column structure representation
{
name: "temperature:2025-01-15:12:00",
value: "72.5",
timestamp: 1736946000000
}
This structure allows for sparse data storage. If a row lacks a particular column, no space is allocated for that column, unlike relational tables where NULL values still consume storage. This property makes column-family stores efficient for datasets where different rows contain varying attributes.
Column families group columns with similar access patterns. Data within a column family is stored contiguously on disk, compressed together, and cached as a unit. Proper column family design directly impacts performance. Frequently accessed columns should reside in the same family, while rarely accessed large objects should occupy separate families to avoid loading unnecessary data.
The system maintains data consistency through eventual consistency models rather than strict ACID transactions. When a write occurs, it propagates to multiple replica nodes asynchronously. The system accepts writes immediately and resolves conflicts later using timestamps or application-defined resolution strategies. This approach prioritizes availability and partition tolerance over immediate consistency.
Data is written to an append-only commit log for durability, then stored in memory structures called memtables. When memtables reach size thresholds, the system flushes them to disk as immutable SSTable files. Background compaction processes merge SSTables, removing obsolete versions and deleted data. This log-structured merge tree (LSM tree) architecture optimizes write performance:
# Write path conceptual flow
write_request = { row_key: "user:1000", column: "email", value: "user@example.com" }
# 1. Append to commit log (sequential write)
commit_log.append(write_request)
# 2. Write to memtable (memory structure)
memtable.put(write_request[:row_key], write_request[:column], write_request[:value])
# 3. Return success to client
# 4. Asynchronously flush memtable to SSTable when full
# 5. Background compaction merges SSTables
Reads query the memtable first, then search SSTables from newest to oldest until the requested data is found. Bloom filters accelerate this process by quickly determining whether an SSTable contains a particular row key, avoiding unnecessary disk reads.
Replication strategies determine how data copies distribute across the cluster. The replication factor specifies the number of replicas, typically three for production systems. Simple strategy assigns replicas to consecutive nodes in the cluster ring, suitable for single-datacenter deployments. Network topology strategy places replicas across different racks and datacenters, providing resilience against hardware and facility failures.
Design Considerations
Column-family stores suit specific access patterns and scaling requirements. The decision to adopt this model depends on workload characteristics, consistency requirements, and operational complexity tolerance.
Write-heavy workloads with append-dominated patterns benefit significantly from column-family stores. The LSM tree architecture converts random writes into sequential log appends, achieving high throughput on rotational disks and SSDs alike. Systems ingesting sensor data, event logs, or user activity streams often achieve 10-100x higher write rates compared to B-tree based storage engines. However, this advantage comes with read amplification, as queries may examine multiple SSTables to construct the current state.
Schema flexibility addresses evolving data requirements without expensive migrations. Adding new columns to existing rows requires no schema alterations or table locks. Different rows can contain completely different column sets, accommodating heterogeneous data naturally. This property benefits content management systems, product catalogs, and multi-tenant applications where data attributes vary significantly between entities:
# Different products with varying attributes
electronics = {
row_key: "product:1000",
columns: {
"basic:name" => "Laptop",
"basic:price" => "999.99",
"specs:cpu" => "Intel i7",
"specs:ram" => "16GB",
"specs:warranty" => "3 years"
}
}
clothing = {
row_key: "product:2000",
columns: {
"basic:name" => "Shirt",
"basic:price" => "29.99",
"specs:size" => "M",
"specs:color" => "Blue",
"specs:material" => "Cotton"
# No CPU, RAM, or warranty columns needed
}
}
Time-series data aligns naturally with the column-family model. Timestamps embedded in column names create self-organizing time-ordered data, enabling efficient range queries without secondary indexes. IoT systems, financial tick data, and monitoring applications commonly use this pattern. The wide-row approach stores all measurements for a device or entity in a single row with timestamp-qualified columns, reducing the number of row keys and improving cache locality.
Conversely, column-family stores present challenges for certain workload types. Complex queries requiring joins across multiple entity types perform poorly. The absence of foreign key relationships and query optimizers means applications must orchestrate multi-step reads and implement join logic. OLAP workloads with ad-hoc analytical queries suffer from the lack of flexible indexing and aggregation capabilities present in relational systems.
Transactions spanning multiple rows or column families require application-level coordination. While most column-family stores support lightweight transactions for single-row updates, distributed transactions across partitions are either unavailable or carry significant performance penalties. Applications requiring strong consistency guarantees across related entities should carefully evaluate whether denormalization can eliminate cross-partition transaction needs or whether a different database model better fits requirements.
Data modeling in column-family stores inverts relational design principles. Denormalization becomes standard practice, duplicating data across multiple rows to support different access patterns. Query patterns drive schema design rather than entity relationships. The same logical entity may appear in multiple column families optimized for specific queries:
# Denormalized design for different access patterns
# User profile access by user ID
user_by_id = {
row_key: "user:#{user_id}",
column_family: "profiles",
columns: {
"name" => "Alice",
"email" => "alice@example.com",
"created" => "2024-01-15"
}
}
# User lookup by email
user_by_email = {
row_key: "email:alice@example.com",
column_family: "email_index",
columns: {
"user_id" => "user:12345"
}
}
This duplication increases storage costs but eliminates the need for secondary indexes and scatter-gather queries. Write operations must update all denormalized copies atomically or accept eventual consistency between related rows.
Operational complexity increases compared to managed relational databases. Column-family stores require cluster management, compaction tuning, replication monitoring, and capacity planning across distributed nodes. Teams must develop expertise in partition key design, consistency level selection, and cluster topology configuration. Small datasets that fit comfortably on a single server rarely justify this operational overhead.
Implementation Approaches
Data modeling strategies in column-family stores differ fundamentally from relational normalization. The approach begins with access pattern identification rather than entity relationship modeling. Each distinct query pattern may require a separate physical table optimized for that specific read or write path.
The wide-row pattern stores related data in a single row with many columns, typically using compound column names to encode multiple dimensions. Time-series applications use this extensively:
RowKey: sensor:device123
Columns:
temperature:2025-01-15:00:00 = 72.5
temperature:2025-01-15:01:00 = 72.8
temperature:2025-01-15:02:00 = 73.1
humidity:2025-01-15:00:00 = 45
humidity:2025-01-15:01:00 = 46
...
This pattern supports efficient range queries over time intervals without scanning multiple rows. The tradeoff involves row size management, as infinitely growing rows eventually degrade performance. Bucketing strategies partition time-series data across multiple rows:
# Time-bucketed approach
def generate_row_key(device_id, timestamp)
bucket = timestamp.strftime("%Y-%m-%d") # Daily buckets
"sensor:#{device_id}:#{bucket}"
end
# Creates separate rows for each day
row_2025_01_15 = generate_row_key("device123", Time.parse("2025-01-15"))
# => "sensor:device123:2025-01-15"
row_2025_01_16 = generate_row_key("device123", Time.parse("2025-01-16"))
# => "sensor:device123:2025-01-16"
The composite key pattern incorporates multiple attributes into the row key to support different query dimensions. A messaging application might use sender-receiver-timestamp combinations:
RowKey: message:user123:user456:2025-01-15:14:30
Columns:
subject = "Meeting reminder"
body = "Don't forget about tomorrow's meeting"
status = "read"
This approach enables efficient queries for messages between specific users within time ranges. However, querying all messages for a user across all conversations requires maintaining a separate index structure.
Inverted index patterns implement secondary access paths by maintaining lookup tables:
# Primary table: content by document ID
document_table = {
row_key: "doc:12345",
columns: {
"title" => "Ruby Performance",
"content" => "...",
"tags" => ["ruby", "performance", "optimization"]
}
}
# Inverted index: documents by tag
tag_index = {
row_key: "tag:ruby",
columns: {
"doc:12345" => "", # Column name is document ID, value is empty or metadata
"doc:12346" => "",
"doc:12350" => ""
}
}
Applications must update both tables during writes to maintain consistency. The empty column values or lightweight metadata minimize storage overhead while the column names serve as the actual index entries.
Materialized view patterns precompute aggregations and transformations:
# Raw events table
events_table = {
row_key: "event:2025-01-15:12:30:123",
columns: {
"user_id" => "user:789",
"action" => "purchase",
"amount" => "99.99"
}
}
# Materialized daily summary
daily_summary = {
row_key: "summary:user:789:2025-01-15",
columns: {
"total_purchases" => "3",
"total_amount" => "284.97",
"last_activity" => "2025-01-15:18:45"
}
}
Background processes or streaming pipelines continuously update materialized views from the raw event stream, trading storage space and write amplification for fast read access to aggregated data.
Partition key design critically impacts cluster performance and data distribution. Poorly chosen partition keys create hotspots where a small number of nodes handle disproportionate traffic:
# Bad: Sequential keys create hotspots
bad_row_key = "user:#{Time.now.to_i}:#{user_id}"
# All recent writes go to the same partition
# Good: Hashed prefix distributes load
require 'digest'
hash_prefix = Digest::MD5.hexdigest(user_id.to_s)[0..3]
good_row_key = "#{hash_prefix}:user:#{user_id}"
# Distributes users across partitions evenly
The hashed prefix ensures random distribution across the cluster while still maintaining efficient lookups when the full user ID is known. Time-based bucketing with hash prefixes combines temporal locality with load distribution:
def bucketed_key(entity_id, timestamp)
bucket = timestamp.strftime("%Y-%m-%d")
hash = Digest::MD5.hexdigest(entity_id.to_s)[0..1]
"#{hash}:#{entity_id}:#{bucket}"
end
Ruby Implementation
Ruby applications interact with column-family stores through driver libraries that handle connection pooling, request routing, and data serialization. The cassandra-driver gem provides the reference implementation for Apache Cassandra:
require 'cassandra'
cluster = Cassandra.cluster(
hosts: ['10.0.1.1', '10.0.1.2', '10.0.1.3'],
port: 9042,
consistency: :quorum,
timeout: 10
)
session = cluster.connect('ecommerce')
The cluster object manages connections to multiple nodes, routing queries to appropriate coordinators based on partition keys. The consistency level determines how many replicas must respond before the operation returns, trading latency for data accuracy guarantees.
Schema definition uses CQL (Cassandra Query Language), a SQL-like language adapted for column-family semantics:
# Create keyspace with replication strategy
session.execute(<<-CQL)
CREATE KEYSPACE IF NOT EXISTS ecommerce
WITH REPLICATION = {
'class': 'NetworkTopologyStrategy',
'datacenter1': 3
}
CQL
# Create column family (table)
session.execute(<<-CQL)
CREATE TABLE IF NOT EXISTS products (
category text,
product_id uuid,
name text,
price decimal,
attributes map<text, text>,
created_at timestamp,
PRIMARY KEY (category, product_id)
)
WITH CLUSTERING ORDER BY (product_id ASC)
CQL
The PRIMARY KEY declaration defines both the partition key (category) and clustering columns (product_id). All products within the same category reside on the same partition, enabling efficient category-scoped queries. Clustering columns determine sort order within the partition.
Write operations use prepared statements to improve performance and prevent injection attacks:
# Prepare statement once
insert_product = session.prepare(<<-CQL)
INSERT INTO products (category, product_id, name, price, attributes, created_at)
VALUES (?, ?, ?, ?, ?, ?)
USING TTL ?
CQL
# Execute multiple times with different values
require 'securerandom'
session.execute(insert_product, arguments: [
'electronics',
SecureRandom.uuid,
'Wireless Mouse',
BigDecimal('29.99'),
{ 'color' => 'black', 'connectivity' => 'bluetooth' },
Time.now,
86400 # TTL in seconds, data expires after 24 hours
])
The driver maintains a pool of prepared statements, sending only parameter values for subsequent executions. TTL (time-to-live) automatically removes data after the specified duration, useful for session data or temporary caches.
Range queries leverage clustering column ordering:
# Query all electronics products with price filtering
result = session.execute(
"SELECT * FROM products WHERE category = ? AND price > ? ALLOW FILTERING",
arguments: ['electronics', BigDecimal('50.00')]
)
result.each do |row|
puts "#{row['name']}: $#{row['price']}"
row['attributes'].each { |k, v| puts " #{k}: #{v}" }
end
The ALLOW FILTERING clause permits non-indexed column filters but may scan large partitions. Production systems should add secondary indexes or design tables specifically for common query patterns rather than relying on filtering.
Batch operations group multiple writes for atomic execution within a partition:
batch = session.batch do |b|
# All updates must share the same partition key for atomicity
b.add(
"UPDATE products SET price = ? WHERE category = ? AND product_id = ?",
arguments: [BigDecimal('24.99'), 'electronics', product_id]
)
b.add(
"INSERT INTO price_history (category, product_id, changed_at, old_price, new_price)
VALUES (?, ?, ?, ?, ?)",
arguments: ['electronics', product_id, Time.now, BigDecimal('29.99'), BigDecimal('24.99')]
)
end
session.execute(batch)
Batches spanning multiple partitions lose atomicity guarantees and degrade performance. Use batches only for related writes to the same partition key.
Asynchronous operations improve throughput when issuing multiple independent queries:
# Fire multiple queries concurrently
futures = categories.map do |category|
session.execute_async(
"SELECT * FROM products WHERE category = ? LIMIT 10",
arguments: [category]
)
end
# Wait for all results
results = futures.map(&:get)
# Process combined results
results.flatten.each do |row|
process_product(row)
end
The driver manages connection pooling and request pipelining transparently, maximizing parallelism across cluster nodes.
Counter columns provide distributed counters without read-modify-write cycles:
session.execute(<<-CQL)
CREATE TABLE IF NOT EXISTS view_counts (
content_id uuid PRIMARY KEY,
views counter
)
CQL
# Increment counter (no read required)
session.execute(
"UPDATE view_counts SET views = views + ? WHERE content_id = ?",
arguments: [1, content_id]
)
# Read counter value
result = session.execute(
"SELECT views FROM view_counts WHERE content_id = ?",
arguments: [content_id]
)
puts "Views: #{result.first['views']}"
Counter updates propagate eventually across replicas, with the system resolving conflicts by summing updates. Counters sacrifice strict accuracy for high-throughput concurrent increments.
Collections (sets, lists, maps) store structured data within columns:
# Add items to a set
session.execute(
"UPDATE products SET tags = tags + ? WHERE category = ? AND product_id = ?",
arguments: [Set.new(['wireless', 'ergonomic']), 'electronics', product_id]
)
# Update map element
session.execute(
"UPDATE products SET attributes['color'] = ? WHERE category = ? AND product_id = ?",
arguments: ['silver', 'electronics', product_id]
)
# Append to list
session.execute(
"UPDATE product_reviews SET review_ids = review_ids + ? WHERE product_id = ?",
arguments: [[review_id], product_id]
)
Collections have size limits (64KB by default) and the entire collection loads into memory during reads. Large or unbounded collections should use separate rows instead.
Performance Considerations
Write performance in column-family stores stems from the sequential append-only architecture. Writes hit the commit log and memtable immediately, avoiding disk seeks. SSD deployments achieve 100,000-500,000 writes per second per node depending on hardware and data size. This throughput scales linearly with cluster size since each node handles a partition subset independently.
However, the LSM tree structure introduces read amplification. A query may examine the memtable plus multiple SSTables before locating requested data. Compaction strategies balance read performance, write amplification, and disk space overhead:
Size-tiered compaction (STCS) groups SSTables of similar size, creating larger files progressively. This approach optimizes write throughput but can temporarily require 2x disk space during major compactions. STCS suits write-once, read-rarely workloads like time-series data:
# Configure STCS in table definition
session.execute(<<-CQL)
CREATE TABLE sensor_data (
device_id text,
reading_time timestamp,
temperature decimal,
PRIMARY KEY (device_id, reading_time)
)
WITH compaction = {
'class': 'SizeTieredCompactionStrategy',
'min_threshold': 4,
'max_threshold': 32
}
CQL
Leveled compaction (LCS) organizes SSTables into levels, with each level containing non-overlapping data ranges. Reads examine fewer files at the cost of higher write amplification from more frequent compaction. LCS benefits read-heavy workloads:
# Configure LCS for read-optimized table
session.execute(<<-CQL)
CREATE TABLE user_profiles (
user_id uuid PRIMARY KEY,
username text,
email text,
preferences map<text, text>
)
WITH compaction = {
'class': 'LeveledCompactionStrategy',
'sstable_size_in_mb': 160
}
CQL
Time-window compaction (TWCS) groups data by time windows, expiring entire SSTables when all contained data exceeds TTL. This strategy eliminates compaction overhead for time-series data with TTLs:
session.execute(<<-CQL)
CREATE TABLE application_logs (
app_id text,
log_time timestamp,
message text,
PRIMARY KEY (app_id, log_time)
)
WITH compaction = {
'class': 'TimeWindowCompactionStrategy',
'compaction_window_size': 1,
'compaction_window_unit': 'DAYS'
}
AND default_time_to_live = 604800
CQL
Bloom filters significantly improve read performance by avoiding disk I/O for absent keys. Each SSTable maintains a probabilistic data structure indicating key presence. Queries skip SSTables with negative bloom filter results:
# Larger bloom filters reduce false positives at memory cost
session.execute(<<-CQL)
ALTER TABLE products
WITH bloom_filter_fp_chance = 0.01
CQL
The false positive probability trades memory overhead for I/O reduction. Values of 0.01-0.1 balance resource usage effectively.
Partition size directly impacts query latency. Partitions exceeding 100MB create hotspots and slow queries that must scan large column sets. Wide-row patterns must implement bucketing to bound partition growth:
# Monitor partition sizes
result = session.execute(<<-CQL)
SELECT
token(category) as token,
category,
COUNT(*) as product_count
FROM products
GROUP BY category
CQL
result.each do |row|
if row['product_count'] > 100_000
puts "Warning: Large partition #{row['category']} with #{row['product_count']} products"
end
end
Refactoring large partitions requires splitting data across multiple partition keys, typically adding time buckets or hash prefixes.
Consistency levels trade latency for data accuracy. LOCAL_QUORUM requires majority acknowledgment within the local datacenter, providing strong consistency with minimal cross-datacenter latency:
# Query with specific consistency level
result = session.execute(
"SELECT * FROM products WHERE category = ?",
arguments: ['electronics'],
consistency: :local_quorum
)
Read repair and anti-entropy processes ensure eventual consistency across replicas. Applications must handle stale reads when using lower consistency levels like ONE or LOCAL_ONE.
Connection pooling parameters affect throughput under concurrent load:
cluster = Cassandra.cluster(
hosts: nodes,
connections_per_local_node: 2, # Connections to local DC nodes
connections_per_remote_node: 1, # Connections to remote DC nodes
requests_per_connection: 128, # Concurrent requests per connection
heartbeat_interval: 30,
idle_timeout: 120
)
Insufficient connections create queuing delays, while excessive connections waste resources. Tune based on request rate and latency requirements through load testing.
Tools & Ecosystem
Apache Cassandra represents the most widely deployed column-family store, with production clusters at Netflix, Apple, and Discord handling millions of operations per second. Cassandra provides masterless architecture where every node can handle reads and writes, eliminating single points of failure. The system automatically rebalances data when adding or removing nodes.
The cassandra-driver gem provides the official Ruby client:
# Gemfile
gem 'cassandra-driver', '~> 3.2'
# Connection with advanced configuration
cluster = Cassandra.cluster(
hosts: ENV['CASSANDRA_HOSTS'].split(','),
port: 9042,
compression: :lz4,
protocol_version: 4,
page_size: 1000,
load_balancing_policy: Cassandra::LoadBalancing::Policies::TokenAware.new(
Cassandra::LoadBalancing::Policies::RoundRobin.new
),
retry_policy: Cassandra::Retry::Policies::DowngradingConsistency.new
)
Token-aware load balancing routes queries directly to nodes owning the partition, eliminating coordinator hops and reducing latency. Downgrading retry policies automatically reduce consistency levels when insufficient replicas respond, maintaining availability during partial outages.
ScyllaDB reimplements Cassandra in C++ with improved performance characteristics, achieving 10x higher throughput on equivalent hardware through thread-per-core architecture and optimized data structures. The cassandra-driver gem works with ScyllaDB clusters without modification due to CQL protocol compatibility.
HBase builds on Hadoop HDFS for storage, integrating with the Hadoop ecosystem for batch processing and analytics. The hbase-ruby gem provides Ruby bindings:
require 'hbase'
client = HBase::Client.new(
host: 'hbase-master.example.com',
port: 9090
)
# HBase uses different terminology: tables contain column families
table = client.table('products')
# Put operation
table.put('electronics:12345', {
'info:name' => 'Laptop',
'info:price' => '999.99',
'specs:cpu' => 'Intel i7'
})
# Get operation
result = table.get('electronics:12345', columns: ['info:name', 'info:price'])
puts result['info:name']
HBase provides strong consistency through ZooKeeper coordination but sacrifices availability during network partitions, following CP semantics rather than Cassandra's AP approach.
DataStax Enterprise extends Cassandra with integrated search, analytics, and graph capabilities. The datastax-ruby-driver gem adds DSE-specific features:
require 'dse'
cluster = Dse.cluster(
hosts: nodes,
graph_name: 'product_graph'
)
# Execute graph query using Gremlin
result = cluster.graph.execute(
"g.V().hasLabel('product').has('category', 'electronics').valueMap()"
)
The integrated search functionality uses Solr indexes for full-text and geospatial queries without external systems.
Cequel provides an ActiveRecord-like ORM for Cassandra:
# Gemfile
gem 'cequel'
# Model definition
class Product
include Cequel::Record
key :category, :text
key :product_id, :uuid
column :name, :text
column :price, :decimal
column :attributes, :map, key_type: :text, value_type: :text
column :created_at, :timestamp
validates :name, presence: true
end
# Usage
product = Product.new(
category: 'electronics',
product_id: SecureRandom.uuid,
name: 'Wireless Keyboard',
price: 49.99
)
product.save
# Query interface
products = Product.where(category: 'electronics').limit(10)
Cequel handles connection management, query generation, and result mapping, reducing boilerplate for applications primarily performing CRUD operations.
Monitoring tools track cluster health and performance. Cassandra exposes metrics through JMX, accessible via tools like DataDog, Prometheus, or New Relic. Key metrics include:
# Example monitoring script using JMX
require 'jmx4r'
JMX::MBean.establish_connection(
host: 'cassandra-node-1',
port: 7199
)
# Read operation metrics
read_latency = JMX::MBean.find_by_name(
'org.apache.cassandra.metrics:type=ClientRequest,scope=Read,name=Latency'
)
puts "Read latency 99th percentile: #{read_latency['99thPercentile']}ms"
# Compaction metrics
pending_tasks = JMX::MBean.find_by_name(
'org.apache.cassandra.metrics:type=Compaction,name=PendingTasks'
)
puts "Pending compactions: #{pending_tasks['Value']}"
Grafana dashboards visualize metrics over time, alerting on anomalies like increasing read latency, growing compaction queues, or unbalanced partition distribution.
Reference
CQL Data Types
| Type | Ruby Type | Description | Example |
|---|---|---|---|
| text | String | UTF-8 encoded string | username |
| uuid | SecureRandom.uuid | Type 4 UUID | Primary keys |
| timeuuid | Cassandra::TimeUuid | Type 1 UUID with timestamp | Event ordering |
| int | Integer | 32-bit signed integer | Age, count |
| bigint | Integer | 64-bit signed integer | Large counters |
| decimal | BigDecimal | Arbitrary precision decimal | Currency |
| timestamp | Time | Date and time with millisecond precision | created_at |
| boolean | TrueClass, FalseClass | True or false | is_active |
| blob | String with ASCII-8BIT encoding | Binary data | Image data |
| counter | Integer | Distributed counter | Page views |
| map | Hash | Key-value pairs | User preferences |
| set | Set | Unique unordered values | Tags |
| list | Array | Ordered values allowing duplicates | Comments |
Consistency Levels
| Level | Replicas | Use Case | Latency |
|---|---|---|---|
| ANY | 1 hinted handoff | Maximum write availability | Lowest |
| ONE | 1 replica | Low latency, eventual consistency | Low |
| TWO | 2 replicas | Balanced consistency and latency | Medium |
| QUORUM | Majority of replicas | Strong consistency | Medium |
| LOCAL_QUORUM | Majority in local datacenter | Multi-DC strong consistency | Medium |
| EACH_QUORUM | Majority in each datacenter | Cross-DC consistency | High |
| ALL | All replicas | Maximum consistency | Highest |
| LOCAL_ONE | 1 replica in local datacenter | Geo-local low latency | Lowest |
Compaction Strategies
| Strategy | Optimized For | Write Amplification | Read Performance | Space Overhead |
|---|---|---|---|---|
| SizeTieredCompactionStrategy | Write-heavy, time-series | Low | Moderate | High during compaction |
| LeveledCompactionStrategy | Read-heavy, updates | High | High | Low |
| TimeWindowCompactionStrategy | Time-series with TTL | Low | High for recent data | Low |
Primary Key Components
| Component | Purpose | Determines | Example |
|---|---|---|---|
| Partition Key | Data distribution | Which nodes store the row | category |
| Clustering Columns | Data ordering | Sort order within partition | product_id, created_at |
| Composite Partition Key | Multi-attribute distribution | Partition by multiple columns | category, subcategory |
Common CQL Operations
| Operation | Syntax | Notes |
|---|---|---|
| Insert | INSERT INTO table (cols) VALUES (vals) | Creates or overwrites row |
| Update | UPDATE table SET col = val WHERE key = ? | Creates row if absent |
| Delete | DELETE FROM table WHERE key = ? | Writes tombstone |
| Select | SELECT cols FROM table WHERE key = ? | Partition key required |
| Batch | BEGIN BATCH ... APPLY BATCH | Atomic within partition |
Connection Pool Settings
| Parameter | Default | Purpose | Tuning Guidance |
|---|---|---|---|
| connections_per_local_node | 1 | Connections to local DC | Increase for high concurrency |
| connections_per_remote_node | 1 | Connections to remote DC | Keep low, prefer local |
| requests_per_connection | 128 | Concurrent requests per connection | Match application concurrency |
| heartbeat_interval | 30 | Seconds between keepalives | Reduce for fast failure detection |
| idle_timeout | 60 | Seconds before closing idle connections | Increase for bursty traffic |
Performance Tuning Checklist
| Aspect | Action | Impact |
|---|---|---|
| Partition Size | Keep under 100MB | Prevents hotspots and slow scans |
| Compaction Strategy | Match workload pattern | Optimizes read or write performance |
| Bloom Filter | Set false positive rate 0.01-0.1 | Reduces unnecessary disk reads |
| Consistency Level | Use LOCAL_QUORUM for most workloads | Balances latency and consistency |
| Connection Pooling | Tune based on load testing | Prevents connection exhaustion |
| Query Pagination | Use token-based pagination | Avoids expensive OFFSET queries |
| Batch Operations | Keep within same partition | Maintains atomicity guarantees |
| TTL | Set on temporary data | Reduces compaction and storage |
| Monitoring | Track read/write latency percentiles | Identifies performance degradation |
| Replication Factor | Use 3 for production | Provides fault tolerance |