Overview
Time-series databases store and retrieve data points indexed by time. Unlike traditional relational databases optimized for random access and complex joins, time-series databases optimize for write-heavy workloads where data arrives chronologically and queries typically scan temporal ranges. These databases handle metrics, events, and measurements that accumulate over time, such as server metrics, sensor readings, financial tick data, and application logs.
The primary distinction between time-series databases and general-purpose databases lies in data access patterns. Time-series workloads exhibit sequential writes, range-based queries, and frequent aggregations over time windows. A monitoring system might write thousands of metric points per second but query data only for specific time ranges with aggregations like averages or percentiles. Traditional databases struggle with this pattern due to index overhead and lack of time-aware optimizations.
Time-series databases emerged from the need to handle increasing volumes of temporal data in monitoring, IoT, and analytics applications. Early solutions repurposed relational databases with specialized schemas, but dedicated time-series databases provide native support for temporal operations, automatic data retention policies, and compression algorithms designed for sequential numeric data.
# Traditional approach: storing metrics in relational database
class Metric < ActiveRecord::Base
# Writes require index updates, slow at scale
# Queries lack time-aware optimizations
end
# Time-series approach: optimized for temporal patterns
influx_client = InfluxDB2::Client.new('http://localhost:8086', token)
write_api = influx_client.create_write_api
# Efficient batched writes with automatic timestamping
write_api.write(data: point, bucket: 'metrics')
The core architectural difference manifests in storage layout. Time-series databases organize data by time, often using specialized storage engines that compress sequential values and eliminate redundant timestamps through delta encoding. This approach reduces storage requirements by 10-20x compared to traditional databases while accelerating temporal queries through time-based partitioning.
Key Principles
Time-series databases operate on several fundamental principles that distinguish them from other database types. Understanding these principles clarifies design decisions and usage patterns.
Time as Primary Index
Every data point in a time-series database associates with a timestamp. The database treats time as the primary organizational axis, not as just another column. Storage engines partition data temporally, creating segments for specific time ranges. This organization enables efficient writes, as new data appends to the most recent segment without affecting older data. Queries specifying time ranges directly access relevant segments without scanning the entire dataset.
Immutable Data Model
Time-series databases assume data represents historical facts that do not change. A temperature reading at 10:00 AM remains constant regardless of subsequent events. This immutability enables aggressive caching, simplified replication, and efficient compression. The database does not support traditional updates; corrections require writing new points with more recent timestamps or deletion of entire time ranges.
Tags and Fields
Time-series databases separate metadata from measurements. Tags identify the data series through key-value pairs (server name, region, sensor ID), while fields contain actual measurements. Tags serve as dimensions for grouping and filtering, indexed for query performance. Fields store numeric values or strings representing the measurements themselves.
# Tags identify what generated the data
tags = {
host: 'web-server-01',
region: 'us-east',
environment: 'production'
}
# Fields contain the measurements
fields = {
cpu_usage: 45.2,
memory_mb: 2048,
request_count: 1523
}
# Timestamp indicates when
timestamp = Time.now.to_i
Tags create the series cardinality—the number of unique combinations of tag values. High cardinality (millions of unique series) challenges time-series databases differently than high data volume. A database might handle billions of points efficiently but struggle with millions of distinct series due to index overhead.
Write Optimization
Time-series databases prioritize write performance since data typically arrives in real-time streams. They achieve high write throughput through several mechanisms: in-memory buffers that batch writes before persisting to disk, append-only storage files that avoid random I/O, and relaxed consistency models that defer replication or indexing.
The write path typically buffers incoming points in memory, sorted by time, until reaching a threshold for flushing to disk. This batching amortizes disk I/O costs across many points. Some databases accept data out of temporal order within a window, buffering and sorting before writing to maintain temporal locality on disk.
Downsampling and Retention
Raw data accumulates rapidly—a single metric collected every second generates 86,400 points daily. Time-series databases provide automatic downsampling to reduce storage requirements for historical data. Downsampling aggregates high-resolution data into lower-resolution summaries, storing 5-minute averages instead of individual second-level points for older data.
Retention policies automatically delete data exceeding specified ages. A database might retain raw data for 7 days, hourly aggregates for 90 days, and daily summaries indefinitely. These policies execute during background compaction, reclaiming storage without impacting write or query performance.
# Define retention policy with downsampling
retention_config = {
duration: '7d', # Keep raw data for 7 days
shard_duration: '1h', # Partition by hour
replication: 1,
downsampling: [
{ duration: '90d', aggregation: 'mean', interval: '1h' },
{ duration: 'INF', aggregation: 'mean', interval: '1d' }
]
}
Compression
Sequential numeric data exhibits patterns that compression algorithms exploit. Delta encoding stores differences between consecutive values rather than absolute values, reducing storage when values change gradually. Run-length encoding compresses repeated values. Specialized codecs like Gorilla compression (Facebook) achieve 12-bit-per-point average for floating-point metrics with minimal CPU overhead.
Compression operates transparently during writes. The database compresses data blocks before writing to disk, decompressing during reads. Query engines operate on compressed data where possible, reducing I/O.
Implementation Approaches
Implementing time-series storage requires choosing between several architectural approaches, each with distinct trade-offs for specific workloads.
Specialized Time-Series Databases
Dedicated time-series databases like InfluxDB, TimescaleDB, and Prometheus optimize all components for temporal workloads. These systems provide native time-series data types, query languages with temporal functions, and storage engines designed for sequential access patterns.
InfluxDB uses a custom storage engine (TSM) that organizes data into time-sharded files with columnar compression. Each shard covers a specific time range, and the query engine knows to access only relevant shards for temporal queries. The line protocol for data ingestion provides high write throughput with minimal parsing overhead.
TimescaleDB extends PostgreSQL with time-series optimizations while retaining SQL compatibility and PostgreSQL features. It automatically partitions tables into chunks based on time ranges, creating a hypertable that appears as a single table but distributes data across many partitions. This approach combines time-series performance with relational database capabilities.
Prometheus targets monitoring and alerting, pulling metrics from instrumented applications. Its local storage uses TSDB blocks with aggressive compression, while the query language (PromQL) provides temporal aggregation and alerting rules. Prometheus assumes eventual consistency and prioritizes availability over durability.
Relational Database with Time-Series Schema
Traditional relational databases can store time-series data through careful schema design. Wide tables with timestamp columns, proper indexing, and partitioning provide acceptable performance for moderate workloads. This approach leverages existing database infrastructure and operational knowledge.
Partitioning divides tables by time ranges, typically daily or weekly partitions. Queries restricted to recent data access only relevant partitions, avoiding full table scans. Older partitions can be archived or dropped without affecting active data. Index strategies focus on time-range queries, often using BRIN indexes in PostgreSQL that summarize ranges rather than indexing every value.
class CreateMetricsTable < ActiveRecord::Migration[7.0]
def change
create_table :metrics, id: false do |t|
t.bigint :id, null: false
t.datetime :timestamp, null: false, precision: 6
t.string :metric_name, null: false
t.jsonb :tags, default: {}
t.float :value, null: false
end
# Partition by month
execute <<-SQL
CREATE INDEX idx_metrics_time ON metrics USING BRIN (timestamp);
CREATE INDEX idx_metrics_tags ON metrics USING GIN (tags);
ALTER TABLE metrics ADD PRIMARY KEY (id, timestamp);
SQL
end
end
The relational approach works when write volumes remain under 10,000 points per second and query patterns align with SQL capabilities. It struggles with high cardinality tag sets and lacks native compression for temporal data.
Distributed Time-Series Architectures
Large-scale deployments distribute time-series data across clusters. Systems like M3DB or distributed InfluxDB provide horizontal scalability through sharding and replication. Each node stores a subset of series, determined by consistent hashing of series identifiers.
Distributed architectures introduce complexity in coordinating queries across nodes, maintaining data consistency during replication, and rebalancing data when nodes join or leave. Query coordinators fan out requests to relevant nodes and aggregate results. The CAP theorem applies—systems typically prioritize availability and partition tolerance over strong consistency for time-series workloads.
Sharding strategies affect query performance. Range-based sharding by time distributes writes across nodes but concentrates queries on nodes holding relevant time ranges. Series-based sharding distributes both reads and writes but requires querying all nodes for time-range queries spanning multiple series.
Hybrid Approaches
Some systems combine approaches. OpenTSDB layers on HBase, providing time-series semantics over a distributed key-value store. VictoriaMetrics offers InfluxDB-compatible APIs with custom storage optimized for high cardinality and long-term retention.
Cloud-managed services like Amazon Timestream or Azure Time Series Insights abstract infrastructure complexity, providing time-series capabilities without managing databases. These services charge based on writes, storage, and queries, suitable for applications preferring operational simplicity over cost optimization.
Ruby Implementation
Ruby applications interact with time-series databases through client libraries and Ruby-specific patterns. Several gems provide idiomatic interfaces to popular time-series databases.
InfluxDB Integration
The influxdb-client gem provides Ruby access to InfluxDB 2.x. It supports batched writes, parameterized queries, and asynchronous operations through connection pooling.
require 'influxdb-client'
# Initialize client with connection parameters
client = InfluxDB2::Client.new(
'http://localhost:8086',
ENV['INFLUXDB_TOKEN'],
bucket: 'system_metrics',
org: 'my_org',
precision: InfluxDB2::WritePrecision::NANOSECOND
)
# Create write API with batching
write_api = client.create_write_api(
write_options: InfluxDB2::WriteOptions.new(
batch_size: 1000,
flush_interval: 10_000, # milliseconds
retry_interval: 5_000
)
)
# Write points with tags and fields
def record_metric(write_api, host, metric_name, value)
point = InfluxDB2::Point.new(name: metric_name)
.add_tag('host', host)
.add_tag('environment', Rails.env)
.add_field('value', value)
.time(Time.now, InfluxDB2::WritePrecision::MILLISECOND)
write_api.write(data: point)
end
# Query with Flux language
query_api = client.create_query_api
flux_query = <<-FLUX
from(bucket: "system_metrics")
|> range(start: -1h)
|> filter(fn: (r) => r._measurement == "cpu_usage")
|> filter(fn: (r) => r.host == "web-01")
|> aggregateWindow(every: 5m, fn: mean)
FLUX
result = query_api.query(query: flux_query)
result.each do |table|
table.records.each do |record|
puts "#{record.time}: #{record.value}"
end
end
client.close!
The gem handles connection management, automatic retries on failures, and background flushing of buffered writes. Applications should reuse client instances rather than creating new connections for each operation.
TimescaleDB with ActiveRecord
TimescaleDB extends PostgreSQL, making it accessible through standard Ruby database libraries. The timescaledb gem provides ActiveRecord integration with hypertable management.
# Migration creating hypertable
class CreateSensorReadings < ActiveRecord::Migration[7.0]
def up
create_table :sensor_readings, id: false do |t|
t.bigint :id, null: false
t.datetime :time, null: false, precision: 6
t.string :sensor_id, null: false
t.float :temperature
t.float :humidity
t.integer :battery_level
end
execute "SELECT create_hypertable('sensor_readings', 'time');"
execute "ALTER TABLE sensor_readings ADD PRIMARY KEY (id, time);"
add_index :sensor_readings, [:sensor_id, :time]
end
def down
drop_table :sensor_readings
end
end
# Model with time-series queries
class SensorReading < ApplicationRecord
self.primary_key = [:id, :time]
scope :recent, -> { where('time > ?', 1.hour.ago) }
scope :for_sensor, ->(sensor_id) { where(sensor_id: sensor_id) }
def self.hourly_average(sensor_id, start_time, end_time)
select(
"time_bucket('1 hour', time) AS hour",
"AVG(temperature) as avg_temp",
"AVG(humidity) as avg_humidity"
)
.where(sensor_id: sensor_id)
.where(time: start_time..end_time)
.group("hour")
.order("hour")
end
def self.continuous_aggregate(name, query)
execute <<-SQL
CREATE MATERIALIZED VIEW #{name}
WITH (timescaledb.continuous) AS
#{query}
SQL
end
end
# Usage
SensorReading.create!(
id: SecureRandom.uuid,
time: Time.current,
sensor_id: 'temp-sensor-01',
temperature: 22.5,
humidity: 65.0,
battery_level: 87
)
averages = SensorReading.hourly_average(
'temp-sensor-01',
24.hours.ago,
Time.current
)
TimescaleDB-specific functions like time_bucket integrate with ActiveRecord through raw SQL or Arel. The primary_key configuration supports composite keys required for TimescaleDB hypertables.
Prometheus Client
Applications expose metrics for Prometheus scraping using the prometheus-client gem. This approach inverts the typical client-server relationship—applications provide HTTP endpoints that Prometheus polls.
require 'prometheus/client'
require 'prometheus/client/rack/exporter'
# Initialize registry
prometheus = Prometheus::Client.registry
# Define metrics
http_requests = prometheus.counter(
:http_requests_total,
docstring: 'Total HTTP requests',
labels: [:method, :path, :status]
)
request_duration = prometheus.histogram(
:http_request_duration_seconds,
docstring: 'HTTP request duration',
labels: [:method, :path],
buckets: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5]
)
# Instrument application
class ApplicationController < ActionController::Base
around_action :track_metrics
private
def track_metrics
start_time = Time.now
begin
yield
status = response.status
rescue => e
status = 500
raise
ensure
duration = Time.now - start_time
labels = {
method: request.method,
path: request.path,
status: status
}
http_requests.increment(labels: labels)
request_duration.observe(duration, labels: labels.except(:status))
end
end
end
# Mount metrics endpoint
Rails.application.routes.draw do
mount Prometheus::Client::Rack::Exporter.new, at: '/metrics'
end
The prometheus-client gem provides thread-safe metric collection suitable for multi-threaded Ruby servers. Metrics accumulate in-process, and the exporter serializes them in Prometheus text format when scraped.
Background Processing for Writes
High-frequency metric collection should avoid blocking application threads. Background processing with Sidekiq or similar frameworks batches metric writes efficiently.
class MetricsWriter
include Sidekiq::Worker
sidekiq_options queue: :metrics, retry: 3
def perform(batch)
client = InfluxDB2::Client.new(
ENV['INFLUXDB_URL'],
ENV['INFLUXDB_TOKEN']
)
write_api = client.create_write_api
points = batch.map do |metric|
InfluxDB2::Point.new(name: metric['name'])
.add_tag('source', metric['source'])
.add_field('value', metric['value'])
.time(Time.parse(metric['timestamp']),
InfluxDB2::WritePrecision::MILLISECOND)
end
write_api.write(data: points)
client.close!
end
end
# Application code
class MetricsCollector
def self.record(name, value, tags = {})
metric = {
name: name,
value: value,
source: tags[:source] || 'app',
timestamp: Time.now.iso8601(3)
}
METRICS_BUFFER << metric
if METRICS_BUFFER.size >= 100
batch = METRICS_BUFFER.shift(100)
MetricsWriter.perform_async(batch)
end
end
end
METRICS_BUFFER = Concurrent::Array.new
Thread-safe data structures like Concurrent::Array handle metric collection across threads. Batching reduces network overhead and database load.
Performance Considerations
Time-series database performance depends on write patterns, query characteristics, cardinality, and retention policies. Understanding these factors guides optimization.
Write Performance
Time-series databases achieve high write throughput through batching and sequential disk writes. Individual point writes incur significant overhead from network round-trips and transaction processing. Batching amortizes these costs across multiple points.
# Inefficient: individual writes
1000.times do |i|
point = InfluxDB2::Point.new(name: 'temperature')
.add_field('value', 20 + rand(10))
write_api.write(data: point)
end
# Result: ~100 writes/second
# Efficient: batched writes
points = 1000.times.map do |i|
InfluxDB2::Point.new(name: 'temperature')
.add_field('value', 20 + rand(10))
end
write_api.write(data: points)
# Result: ~10,000 writes/second
Batch sizes between 1,000 and 10,000 points optimize throughput without excessive memory usage. Larger batches improve write throughput but increase latency and memory consumption. Applications should tune batch sizes based on data arrival rates and acceptable latency.
Write amplification occurs when storage engines reorganize data during compaction. Frequently writing small batches creates many small storage files that require merging. Configure appropriate flush intervals to balance write latency against compaction overhead.
Query Optimization
Time-series queries perform best when restricted to specific time ranges and series. Query patterns that scan all series or long time ranges require careful optimization.
Time-range selection dramatically affects query performance. A query scanning one hour of data might execute in milliseconds, while scanning one year could take minutes. Applications should limit query ranges and page through results for large temporal spans.
# Slow: unbounded time range
flux_query = <<-FLUX
from(bucket: "metrics")
|> filter(fn: (r) => r._measurement == "cpu")
|> mean()
FLUX
# Fast: bounded time range
flux_query = <<-FLUX
from(bucket: "metrics")
|> range(start: -1h)
|> filter(fn: (r) => r._measurement == "cpu")
|> mean()
FLUX
Tag filtering reduces the series scanned during queries. Queries should filter on indexed tags before applying computations. The order of filters affects performance in some databases—place highly selective filters first.
Downsampled data provides faster queries for historical analysis. Rather than querying raw one-second data points over a month, query pre-aggregated hourly data. This reduces query time by two orders of magnitude with minimal accuracy loss for most analyses.
Cardinality Management
High cardinality—many unique combinations of tag values—challenges time-series databases. Each unique series requires index entries and metadata storage. Databases perform well with millions of unique series but degrade with tens of millions.
Applications should avoid unbounded cardinality sources. User IDs, session tokens, or UUIDs as tags create new series indefinitely. Instead, use bounded sets like server names, service types, or geographical regions.
# High cardinality: unbounded tag values
# Creates new series for every user and request
point = InfluxDB2::Point.new(name: 'request')
.add_tag('user_id', user.id) # Bad: millions of values
.add_tag('request_id', request.uuid) # Bad: infinite values
.add_field('duration_ms', 45)
# Low cardinality: bounded tag values
# Reuses existing series
point = InfluxDB2::Point.new(name: 'request')
.add_tag('endpoint', '/api/users') # Good: limited values
.add_tag('method', 'GET') # Good: finite set
.add_tag('status_code', '200') # Good: small range
.add_field('duration_ms', 45)
.add_field('user_id', user.id) # OK: in field, not tag
Fields handle high-cardinality values without index overhead. Store identifiers and unique values in fields rather than tags. Applications can filter on field values, though less efficiently than tag filtering.
Memory Management
Time-series databases cache recent data in memory for fast access. Memory consumption grows with active series count and retention of recent data. Write buffers, query caches, and series indexes all consume memory.
Monitor memory metrics to detect issues. Excessive memory usage often indicates high cardinality or retention of too much hot data. Reducing retention windows or implementing more aggressive downsampling alleviates memory pressure.
# Monitor database memory usage
def check_influxdb_memory
stats_api = client.create_api_client('v2')
metrics = stats_api.get_metrics
memory_usage = metrics['memory_bytes']
memory_limit = metrics['memory_limit_bytes']
usage_pct = (memory_usage.to_f / memory_limit * 100).round(2)
if usage_pct > 80
Rails.logger.warn "InfluxDB memory high: #{usage_pct}%"
# Consider reducing retention or increasing resources
end
end
Storage Optimization
Compression reduces storage requirements significantly. Time-series databases typically achieve 10-20x compression through specialized codecs. Compression ratios improve with longer retention—more data provides better compression patterns.
Storage grows linearly with write rate and retention period. A metric collected every second with one-year retention generates 31.5 million points annually. At 12 bytes per compressed point, this requires 378 MB per metric. Multiply by metric count to estimate storage needs.
Retention policies automatically delete old data, preventing unbounded storage growth. Applications should configure retention matching their analysis needs—keeping raw data for operational time frames and aggregated data for historical analysis.
Tools & Ecosystem
The time-series database ecosystem includes specialized databases, monitoring platforms, visualization tools, and supporting libraries. Selecting appropriate tools depends on use case requirements.
Database Options
InfluxDB provides a complete time-series platform with clustering, visualization, and alerting. Version 2.x introduced Flux query language and unified time-series and tasks in a single platform. InfluxDB excels at moderate cardinality workloads with strong consistency requirements. The OSS version limits to single-node deployments; clustering requires the commercial version.
TimescaleDB extends PostgreSQL, combining time-series performance with relational capabilities. Applications can join time-series data with relational tables, use SQL for queries, and leverage existing PostgreSQL tools. TimescaleDB handles high-cardinality workloads better than InfluxDB but requires PostgreSQL administration knowledge.
Prometheus targets service monitoring and alerting. Its pull-based model and local storage suit monitoring architectures where Prometheus scrapes metrics from application endpoints. Prometheus excels at short-term metrics retention (weeks) but struggles with long-term storage. Many deployments pair Prometheus with long-term storage backends like Cortex or Thanos.
VictoriaMetrics provides a Prometheus-compatible database optimized for high cardinality and long retention. It supports both push and pull models, handles millions of active series, and achieves better compression than Prometheus. VictoriaMetrics suits large-scale monitoring deployments requiring long-term metric retention.
Ruby Libraries
Several gems simplify time-series database integration:
# influxdb-client: InfluxDB 2.x support
gem 'influxdb-client'
# timescaledb: Rails/ActiveRecord integration
gem 'timescaledb'
# prometheus-client: Application metrics
gem 'prometheus-client'
# graphite-api: Graphite protocol support
gem 'graphite-api'
The influxdb-client gem provides comprehensive InfluxDB 2.x support including Flux queries, write batching, and asynchronous operations. For InfluxDB 1.x, the older influxdb gem remains available but lacks features from the 2.x API.
The timescaledb gem adds TimescaleDB-specific ActiveRecord methods and migrations. It handles hypertable creation, continuous aggregates, and compression policies through Rails migrations.
Visualization Platforms
Grafana dominates time-series visualization, supporting dozens of data sources including InfluxDB, Prometheus, TimescaleDB, and Graphite. It provides dashboarding, alerting, and data exploration. Ruby applications typically write metrics to time-series databases that Grafana queries directly.
# Configure Grafana dashboard via API
require 'net/http'
require 'json'
def create_grafana_dashboard(title, panels)
uri = URI("#{ENV['GRAFANA_URL']}/api/dashboards/db")
request = Net::HTTP::Post.new(uri)
request['Authorization'] = "Bearer #{ENV['GRAFANA_API_KEY']}"
request['Content-Type'] = 'application/json'
dashboard = {
dashboard: {
title: title,
panels: panels,
schemaVersion: 16,
version: 0
},
overwrite: false
}
request.body = dashboard.to_json
response = Net::HTTP.start(uri.hostname, uri.port, use_ssl: true) do |http|
http.request(request)
end
JSON.parse(response.body)
end
Kibana and Elastic stack provide alternative visualization for time-series data stored in Elasticsearch. This approach suits applications already using Elasticsearch for logging and search.
Data Collection Agents
Telegraf collects system and application metrics, writing to InfluxDB, Prometheus, or other outputs. It provides input plugins for monitoring CPU, memory, disk, network, and various services. Ruby applications expose metrics through StatsD or HTTP endpoints that Telegraf polls.
# Output metrics in StatsD format for Telegraf
require 'statsd-instrument'
StatsD.backend = StatsD::Instrument::Backends::UDPBackend.new(
'localhost:8125'
)
class ApplicationController < ActionController::Base
around_action :track_request
private
def track_request
start = Time.now
begin
yield
ensure
duration = (Time.now - start) * 1000
StatsD.histogram('request.duration', duration,
tags: ["endpoint:#{params[:controller]}.#{params[:action]}"])
StatsD.increment('request.count',
tags: ["status:#{response.status}"])
end
end
end
Prometheus exporters expose metrics from systems not directly instrumented. Exporters exist for databases, message queues, cloud services, and hardware monitoring. Ruby applications can create custom exporters using the prometheus-client gem.
Design Considerations
Selecting and implementing time-series databases requires evaluating trade-offs around consistency, availability, query capabilities, and operational complexity.
Database Selection Criteria
Write and query patterns determine database suitability. Applications with high write volumes (>10,000 points/second) benefit from databases optimized for ingest like InfluxDB or VictoriaMetrics. Applications prioritizing query flexibility might choose TimescaleDB for SQL support.
Cardinality expectations affect database choice. InfluxDB handles moderate cardinality (millions of series) efficiently but degrades with very high cardinality. VictoriaMetrics and M3DB better handle high cardinality workloads with billions of unique series.
Retention requirements influence storage architecture. Short-term retention (days to weeks) suits in-memory databases or local storage engines. Long-term retention (months to years) requires distributed storage with efficient compression and tiered storage options.
# Decision matrix implementation
class DatabaseSelector
DATABASES = {
influxdb: {
max_write_rate: 100_000,
max_cardinality: 10_000_000,
sql_support: false,
clustering: :commercial,
ideal_retention: '90d'
},
timescaledb: {
max_write_rate: 50_000,
max_cardinality: 100_000_000,
sql_support: true,
clustering: :native,
ideal_retention: '1y'
},
victoriametrics: {
max_write_rate: 1_000_000,
max_cardinality: 1_000_000_000,
sql_support: false,
clustering: :native,
ideal_retention: '1y'
}
}
def self.recommend(requirements)
DATABASES.select do |name, capabilities|
capabilities[:max_write_rate] >= requirements[:write_rate] &&
capabilities[:max_cardinality] >= requirements[:cardinality]
end.keys
end
end
# Usage
requirements = {
write_rate: 75_000,
cardinality: 5_000_000,
retention: '180d'
}
suitable = DatabaseSelector.recommend(requirements)
# => [:influxdb, :timescaledb, :victoriametrics]
Consistency vs Availability Trade-offs
Time-series workloads typically tolerate eventual consistency. Monitoring data arriving a few seconds late rarely affects analysis. This tolerance enables architectures prioritizing availability—accepting writes even during network partitions or node failures.
Some time-series databases sacrifice consistency for availability. Writes succeed immediately without waiting for replication. Queries might return slightly different results from different nodes during network partitions. This model suits monitoring and metrics collection where missing a few data points matters less than maintaining write availability.
Applications requiring strong consistency should choose databases providing tunable consistency levels. TimescaleDB inherits PostgreSQL's consistency model, ensuring writes persist before acknowledging. InfluxDB Enterprise allows consistency level configuration per write.
Push vs Pull Models
Time-series data collection follows push or pull patterns. Push models have applications send metrics to a central database. Pull models have the database scrape metrics from application endpoints.
Push models suit distributed applications or environments with dynamic scaling. Applications know when they generate metrics and push immediately without waiting for scraping. Network firewalls rarely block outbound pushes. The downside: applications need database connection management and retry logic.
Pull models centralize configuration in the metrics system. Prometheus scrapes configured targets at regular intervals, controlling collection frequency and handling service discovery. Applications expose simple HTTP endpoints without managing connections. The downside: requires network access from scraper to all applications.
# Push model: application sends metrics
class MetricsPusher
def initialize
@client = InfluxDB2::Client.new(ENV['INFLUXDB_URL'], ENV['TOKEN'])
@write_api = @client.create_write_api
end
def record(name, value, tags = {})
point = InfluxDB2::Point.new(name: name)
tags.each { |k, v| point.add_tag(k.to_s, v.to_s) }
point.add_field('value', value)
@write_api.write(data: point)
end
end
# Pull model: Prometheus scrapes endpoint
class MetricsExporter
def initialize
@registry = Prometheus::Client.registry
@counter = @registry.counter(:requests_total,
docstring: 'Total requests',
labels: [:status])
end
def record(status)
@counter.increment(labels: { status: status })
# Prometheus will scrape /metrics endpoint periodically
end
end
Schema Design Patterns
Time-series schema design focuses on tag selection and field organization. Tags should have bounded cardinality and identify dimensions for filtering and grouping. Fields contain measurements and high-cardinality identifiers.
Wide schemas store many measurements in a single series, reducing series count but potentially wasting storage for sparse data. Narrow schemas split measurements into separate series, increasing series count but storing only present values.
# Wide schema: multiple fields per series
# Advantage: fewer series, simpler queries
# Disadvantage: sparse data wastes storage
point = InfluxDB2::Point.new(name: 'server_metrics')
.add_tag('host', 'web-01')
.add_field('cpu_percent', 45.2)
.add_field('memory_mb', 2048)
.add_field('disk_gb', 125)
.add_field('network_mbps', 15.3)
# Narrow schema: one field per series
# Advantage: efficient storage for sparse data
# Disadvantage: more series, complex queries
['cpu_percent', 'memory_mb', 'disk_gb', 'network_mbps'].each do |metric|
point = InfluxDB2::Point.new(name: metric)
.add_tag('host', 'web-01')
.add_field('value', values[metric])
end
Applications should avoid storing metadata that changes frequently as tags. Server IP addresses might seem like good tags but create new series when servers change IPs. Store such identifiers in fields or separate metadata systems.
Reference
Database Comparison
| Database | Write Model | Query Language | Clustering | Best For |
|---|---|---|---|---|
| InfluxDB | Push | Flux, InfluxQL | Commercial | Moderate cardinality, strong consistency |
| TimescaleDB | Push | SQL | Native | High cardinality, relational features |
| Prometheus | Pull | PromQL | Federation | Service monitoring, short retention |
| VictoriaMetrics | Push/Pull | PromQL, MetricsQL | Native | High cardinality, long retention |
| Graphite | Push | Functions | Carbon-relay | Legacy systems, simple metrics |
| M3DB | Push | PromQL | Native | Extreme scale, distributed |
Ruby Client Libraries
| Gem | Database | Features | Use Case |
|---|---|---|---|
| influxdb-client | InfluxDB 2.x | Batching, Flux queries, async | Modern InfluxDB deployments |
| influxdb | InfluxDB 1.x | Basic writes, InfluxQL | Legacy InfluxDB systems |
| timescaledb | TimescaleDB | Hypertables, continuous aggregates | PostgreSQL-based systems |
| prometheus-client | Prometheus | Metrics exposure, types | Application instrumentation |
| graphite-api | Graphite | Metric formatting, sending | Graphite integration |
Write Performance Factors
| Factor | Impact | Optimization Strategy |
|---|---|---|
| Batch size | 10-100x throughput | Batch 1000-10000 points per write |
| Point size | Memory and network | Minimize tag count, avoid large field values |
| Cardinality | Index overhead | Use bounded tag sets, limit series count |
| Timestamp precision | Storage size | Use milliseconds unless microseconds needed |
| Compression | Disk I/O | Enable native compression, tune levels |
| Retention | Write amplification | Configure appropriate downsampling policies |
Query Optimization Techniques
| Technique | Benefit | Implementation |
|---|---|---|
| Time range limits | Reduce scan size | Always specify start/end times |
| Tag filtering | Series reduction | Filter on indexed tags first |
| Downsampled queries | 10-100x faster | Query aggregated data for historical analysis |
| Result limits | Memory control | Limit result rows, paginate large sets |
| Cached queries | Sub-second response | Cache results for repeated queries |
| Continuous aggregates | Real-time performance | Pre-compute common aggregations |
Common Tag Design Patterns
| Pattern | Tags | Fields | Cardinality |
|---|---|---|---|
| Infrastructure | host, region, cluster | cpu, memory, disk | Low (hundreds) |
| Application | service, endpoint, method | duration_ms, count | Medium (thousands) |
| IoT sensors | device_type, location | temperature, humidity | High (millions) |
| Financial | symbol, exchange, order_type | price, volume | Very high (billions) |
Retention Policy Configuration
| Retention | Use Case | Storage Impact | Query Performance |
|---|---|---|---|
| Raw: 7 days | Recent operational analysis | High write rate, full resolution | Fast for recent queries |
| Hourly: 90 days | Short-term trends | 1/3600 of raw data | Good for hourly analysis |
| Daily: 1 year | Historical reporting | 1/86400 of raw data | Excellent for long-term trends |
| Monthly: indefinite | Long-term archives | 1/2592000 of raw data | Sufficient for annual reports |
Flux Query Language Basics
| Operation | Description | Example |
|---|---|---|
| from | Specify data source | from(bucket: "metrics") |
| range | Time window | range(start: -1h) |
| filter | Series selection | filter(fn: (r) => r.host == "web-01") |
| aggregateWindow | Time-based grouping | aggregateWindow(every: 5m, fn: mean) |
| group | Group by tags | group(columns: ["host"]) |
| map | Transform values | map(fn: (r) => ({r with scaled: r._value * 100})) |
| join | Combine streams | join(tables: {a: stream1, b: stream2}, on: ["host"]) |
TimescaleDB Functions
| Function | Purpose | Example |
|---|---|---|
| time_bucket | Time-based grouping | time_bucket('5 minutes', time) |
| first | First value in group | first(temperature, time) |
| last | Last value in group | last(temperature, time) |
| locf | Last observation carried forward | locf(reading) |
| interpolate | Linear interpolation | interpolate(reading) |
| time_bucket_gapfill | Fill missing time buckets | time_bucket_gapfill('1 hour', time) |
Monitoring Metrics
| Metric | Normal Range | Action Threshold |
|---|---|---|
| Write throughput | Varies by hardware | <80% of rated capacity |
| Query latency p95 | <100ms for recent data | >1 second |
| Memory usage | 50-70% | >85% |
| Series cardinality | Depends on database | Check database limits |
| Disk usage growth | Linear with retention | >90% capacity |
| Query queue depth | 0-10 | >100 queued queries |