CrackedRuby CrackedRuby

Overview

Logging aggregation collects log data from multiple sources across distributed systems and consolidates this data into a centralized location for analysis, monitoring, and troubleshooting. Applications, servers, containers, and infrastructure components generate logs independently, creating fragmented data scattered across the system. Log aggregation solves the operational challenge of accessing and correlating this dispersed information.

The practice emerged from the complexity of managing multi-server environments where debugging required SSH access to individual machines to grep through local log files. As systems scaled horizontally, this approach became impractical. Logging aggregation transforms the operational model from distributed file access to centralized data streams.

Modern logging aggregation handles structured and unstructured log formats, processes high-volume data streams, supports real-time analysis, and integrates with alerting and visualization systems. The architecture typically includes log shippers (agents that collect logs), transport mechanisms (message queues or direct transmission), aggregation servers (centralized collection points), storage backends (databases or file systems), and analysis interfaces (search, dashboards, alerting).

# Basic structured logging in Ruby application
require 'logger'
require 'json'

class StructuredLogger
  def initialize(output = $stdout)
    @logger = Logger.new(output)
    @logger.formatter = proc do |severity, datetime, progname, msg|
      {
        timestamp: datetime.iso8601,
        level: severity,
        message: msg,
        service: 'api-server',
        environment: ENV['RACK_ENV']
      }.to_json + "\n"
    end
  end
  
  def info(message, metadata = {})
    @logger.info(message.merge(metadata))
  end
end

logger = StructuredLogger.new
logger.info({ event: 'user_login', user_id: 12345, ip: '192.168.1.1' })
# => {"timestamp":"2025-10-10T10:30:00Z","level":"INFO","message":{"event":"user_login","user_id":12345,"ip":"192.168.1.1"},"service":"api-server","environment":"production"}

Key Principles

Logging aggregation operates on several foundational principles that determine system design and effectiveness.

Centralization consolidates logs from distributed sources into unified storage. Each application server, database, load balancer, and infrastructure component sends logs to a central aggregation point rather than maintaining local files. Centralization eliminates the need to access individual machines for troubleshooting and creates a single source of truth for system behavior.

Structured data replaces free-form text logs with parseable formats. JSON, key-value pairs, or other structured formats enable field-level indexing and querying. Structured logs support filtering by specific fields like user ID, request ID, or error type without relying on string pattern matching. The structure must balance human readability with machine parseability.

Correlation links related log entries across services through shared identifiers. Request IDs, trace IDs, or session identifiers connect log entries generated by different components handling the same user request. Distributed tracing relies on correlation to reconstruct the complete path of a request through a microservices architecture.

Buffering and reliability handle network interruptions and downstream system failures. Log shippers buffer messages locally when the aggregation server becomes unavailable, preventing log loss during temporary outages. Back-pressure mechanisms prevent memory exhaustion when buffer limits approach. Reliability requirements vary based on log importance—audit logs demand higher guarantees than debug logs.

Scalability accommodates growing log volumes without degrading system performance. Horizontal scaling adds more aggregation servers and storage nodes. Partitioning distributes logs across multiple storage backends based on time ranges, service names, or other attributes. Compression reduces storage requirements and network bandwidth.

Query performance enables rapid search across large log datasets. Indexing strategies determine query speed and storage costs. Full-text search indexes accelerate string pattern matching but increase storage overhead. Field-specific indexes optimize filtering on commonly queried attributes.

# Log correlation with request ID
require 'securerandom'

class RequestLogger
  def self.with_request_id
    request_id = SecureRandom.uuid
    Thread.current[:request_id] = request_id
    yield
  ensure
    Thread.current[:request_id] = nil
  end
  
  def self.log(message, metadata = {})
    entry = {
      timestamp: Time.now.utc.iso8601,
      request_id: Thread.current[:request_id],
      message: message
    }.merge(metadata)
    
    puts entry.to_json
  end
end

RequestLogger.with_request_id do
  RequestLogger.log('Request received', path: '/api/users')
  RequestLogger.log('Database query executed', query: 'SELECT * FROM users', duration_ms: 45)
  RequestLogger.log('Response sent', status: 200)
end
# All three log entries share the same request_id

Implementation Approaches

Different architectures address varying requirements for reliability, performance, and operational complexity.

Direct shipping sends logs from applications directly to the aggregation server without intermediate components. Applications include a logging library that transmits log entries via HTTP, TCP, or UDP to the aggregation endpoint. This approach minimizes infrastructure complexity but couples application performance to aggregation system availability.

Direct shipping works for small deployments with reliable networks. Network latency directly impacts application response times when using synchronous transmission. Asynchronous transmission with local buffering mitigates this coupling but introduces complexity in the application code. Connection pooling and keep-alive reduce overhead from establishing connections for each log entry.

Agent-based collection deploys log shipping agents on each host that read application log files and forward entries to aggregation servers. Agents decouple application code from log transmission infrastructure. Applications write to local files or stdout, and the agent handles collection, parsing, buffering, and transmission.

Agents support multiple input sources on a single host, aggregate logs from multiple applications, and provide centralized configuration for log routing. File-based collection introduces lag between log generation and availability in the aggregation system. Agents consume system resources—CPU for parsing, memory for buffering, and disk I/O for reading files.

# Application writes to stdout, agent collects
class StdoutLogger
  def self.info(message, metadata = {})
    entry = {
      timestamp: Time.now.utc.iso8601,
      level: 'INFO',
      service: ENV['SERVICE_NAME'],
      message: message
    }.merge(metadata)
    
    puts entry.to_json
  end
end

# Agent configuration (Fluentd example in Ruby)
# <source>
#   @type tail
#   path /var/log/app/*.log
#   pos_file /var/log/td-agent/app.pos
#   tag app.logs
#   <parse>
#     @type json
#     time_key timestamp
#     time_format %Y-%m-%dT%H:%M:%S.%LZ
#   </parse>
# </source>

Message queue buffering introduces a message broker between log sources and aggregation servers. Applications or agents publish logs to a queue (Kafka, RabbitMQ, Redis), and aggregation servers consume from the queue. The queue provides buffering, load distribution, and replay capabilities.

Message queues decouple log producers from consumers, allowing independent scaling. Queue depth metrics indicate system health and processing lag. Persistent queues survive broker restarts but require disk I/O. In-memory queues offer higher throughput but risk data loss during failures. Topic partitioning distributes load across multiple consumers and enables parallel processing.

Sidecar containers deploy log collection containers alongside application containers in orchestrated environments. Each pod or task includes both the application container and a log shipping sidecar. The sidecar reads application logs from a shared volume or stdout and forwards entries to the aggregation system.

Sidecars isolate logging infrastructure from application code and configuration. The pattern works well in Kubernetes where pod-level networking and volume sharing simplify communication. Resource allocation for sidecars must account for log volume and processing overhead. Sidecar updates deploy independently from application updates.

Gateway aggregation consolidates logs at an intermediate tier before final aggregation. Regional or cluster-level gateways collect logs from local sources, perform filtering or enrichment, and forward to central storage. Multi-tier architectures reduce network traffic across WAN links and provide regional data isolation for compliance requirements.

Gateway tiers introduce additional failure points and latency. Gateway configuration complexity increases with the number of tiers. Benefits include bandwidth reduction through log sampling or filtering, data sovereignty compliance by keeping logs in specific regions, and reduced load on central aggregation infrastructure.

Ruby Implementation

Ruby applications integrate with logging aggregation through structured logging libraries, log shipping clients, and framework-specific instrumentation.

Semantic Logger provides structured logging with multiple appender support including direct HTTP shipping, file output for agent collection, and syslog transmission.

require 'semantic_logger'

# Configure multiple appenders
SemanticLogger.default_level = :info
SemanticLogger.add_appender(file_name: 'log/application.log', formatter: :json)
SemanticLogger.add_appender(
  appender: :http,
  url: 'http://logaggregator.example.com:8080/logs',
  header: { 'Authorization' => 'Bearer token123' }
)

class UserService
  include SemanticLogger::Loggable
  
  def create_user(params)
    logger.info('Creating user', params.slice(:email, :name))
    
    user = User.create(params)
    
    logger.info('User created', user_id: user.id, duration: measure)
    user
  rescue => e
    logger.error('User creation failed', exception: e, params: params)
    raise
  end
  
  def measure
    start = Time.now
    yield if block_given?
    ((Time.now - start) * 1000).round(2)
  end
end

Rails integration extends the default logger to output structured JSON and add request context to all log entries within a request lifecycle.

# config/environments/production.rb
Rails.application.configure do
  config.log_formatter = proc do |severity, time, progname, msg|
    {
      timestamp: time.utc.iso8601,
      severity: severity,
      progname: progname,
      message: msg,
      environment: Rails.env,
      host: Socket.gethostname
    }.to_json + "\n"
  end
  
  config.logger = ActiveSupport::Logger.new($stdout)
end

# lib/log_request_context.rb
class LogRequestContext
  def initialize(app)
    @app = app
  end
  
  def call(env)
    request_id = env['HTTP_X_REQUEST_ID'] || SecureRandom.uuid
    
    RequestStore.store[:request_id] = request_id
    RequestStore.store[:user_id] = extract_user_id(env)
    
    @app.call(env)
  ensure
    RequestStore.clear!
  end
  
  private
  
  def extract_user_id(env)
    # Extract from JWT token, session, etc.
  end
end

# Override logger to inject request context
module RequestContextLogger
  def add(severity, message = nil, progname = nil, &block)
    if message.is_a?(Hash)
      message = message.merge(
        request_id: RequestStore.store[:request_id],
        user_id: RequestStore.store[:user_id]
      )
    end
    super
  end
end

ActiveSupport::Logger.prepend(RequestContextLogger)

Fluentd Ruby SDK sends logs directly to Fluentd servers from application code, eliminating the need for file-based collection.

require 'fluent-logger'

class FluentLogger
  def initialize
    @logger = Fluent::Logger::FluentLogger.new(
      'app',
      host: ENV['FLUENTD_HOST'] || 'localhost',
      port: ENV['FLUENTD_PORT']&.to_i || 24224
    )
  end
  
  def log(tag, data)
    enriched_data = data.merge(
      timestamp: Time.now.to_i,
      hostname: Socket.gethostname,
      pid: Process.pid,
      thread_id: Thread.current.object_id
    )
    
    @logger.post(tag, enriched_data)
  rescue => e
    # Fallback to stderr if Fluentd unavailable
    warn "Fluentd logging failed: #{e.message}"
    warn enriched_data.to_json
  end
end

fluent = FluentLogger.new
fluent.log('user.action', { action: 'login', user_id: 123, success: true })

Logstash JSON formatter prepares logs in the format expected by Logstash pipelines.

require 'logstash-logger'

logger = LogStashLogger.new(
  type: :tcp,
  host: 'logstash.example.com',
  port: 5228,
  ssl_enable: true
)

logger.info(
  event: 'payment_processed',
  amount: 99.99,
  currency: 'USD',
  user_id: 456,
  transaction_id: 'txn_abc123'
)

Custom batching implementation accumulates log entries and sends in batches to reduce network overhead.

require 'net/http'
require 'json'

class BatchLogger
  def initialize(endpoint, batch_size: 100, flush_interval: 5)
    @endpoint = URI(endpoint)
    @batch = []
    @mutex = Mutex.new
    @batch_size = batch_size
    @flush_interval = flush_interval
    
    start_flush_timer
  end
  
  def log(entry)
    @mutex.synchronize do
      @batch << entry.merge(timestamp: Time.now.utc.iso8601)
      
      flush if @batch.size >= @batch_size
    end
  end
  
  def flush
    return if @batch.empty?
    
    batch_copy = nil
    @mutex.synchronize do
      batch_copy = @batch.dup
      @batch.clear
    end
    
    send_batch(batch_copy)
  end
  
  private
  
  def send_batch(batch)
    http = Net::HTTP.new(@endpoint.host, @endpoint.port)
    http.use_ssl = @endpoint.scheme == 'https'
    
    request = Net::HTTP::Post.new(@endpoint.path)
    request['Content-Type'] = 'application/json'
    request.body = { logs: batch }.to_json
    
    http.request(request)
  rescue => e
    warn "Failed to send log batch: #{e.message}"
  end
  
  def start_flush_timer
    Thread.new do
      loop do
        sleep @flush_interval
        flush
      end
    end
  end
end

Tools & Ecosystem

The logging aggregation landscape includes open-source and commercial solutions with different strengths and operational characteristics.

ELK Stack (Elasticsearch, Logstash, Kibana) combines log ingestion, indexing, storage, and visualization. Logstash collects and parses logs from multiple sources, Elasticsearch provides full-text search and analytics, and Kibana offers dashboards and querying interfaces. The stack handles high log volumes and supports complex querying. Elasticsearch cluster management requires expertise, and resource consumption grows with data volume and retention periods.

Fluentd provides flexible log collection with a plugin architecture supporting hundreds of input sources, filters, and output destinations. Written in Ruby with performance-critical parts in C, Fluentd handles structured and unstructured logs. The unified logging layer pattern uses Fluentd as a central collection point that routes logs to multiple backends. Memory buffering and file-based persistent queuing ensure reliability. Configuration complexity increases with the number of sources and routing rules.

Fluent Bit offers a lightweight alternative to Fluentd with lower memory footprint and CPU usage. Designed for embedded systems and edge computing, Fluent Bit works well as a sidecar container or IoT device agent. The reduced feature set focuses on core collection and forwarding capabilities. Fluent Bit often forwards to Fluentd aggregators for additional processing and routing.

Splunk provides enterprise log management with advanced analytics, machine learning anomaly detection, and compliance reporting. The commercial platform handles massive data volumes with distributed indexing and search. Licensing costs scale with data ingestion volume. Splunk's proprietary query language (SPL) supports complex correlations and statistical analysis.

Graylog combines log management with security information and event management (SIEM) capabilities. Built on Elasticsearch, MongoDB, and a Java server component, Graylog offers role-based access control, stream processing, and alerting. The open-source version provides core functionality while the enterprise edition adds audit logging and additional data sources.

Loki from Grafana Labs takes a different approach by indexing labels rather than full-text content. Logs store as compressed chunks referenced by label indexes, dramatically reducing storage and indexing costs. Query performance depends on label cardinality—low cardinality label sets perform well while high cardinality degrades performance. Loki integrates with Prometheus metrics and Grafana dashboards.

# Fluentd configuration for Ruby app collection
# fluent.conf
# <source>
#   @type forward
#   port 24224
#   bind 0.0.0.0
# </source>
#
# <filter app.**>
#   @type record_transformer
#   <record>
#     cluster "production-us-east"
#     datacenter "us-east-1a"
#   </record>
# </filter>
#
# <match app.error>
#   @type elasticsearch
#   host elasticsearch.example.com
#   port 9200
#   index_name app-errors
#   type_name _doc
# </match>
#
# <match app.**>
#   @type s3
#   aws_key_id YOUR_AWS_KEY
#   aws_sec_key YOUR_AWS_SECRET
#   s3_bucket app-logs-archive
#   s3_region us-east-1
#   path logs/%Y/%m/%d/
#   time_slice_format %Y%m%d
#   
#   <buffer>
#     @type file
#     path /var/log/fluentd/s3
#     flush_interval 5m
#   </buffer>
# </match>

Vector from Datadog provides high-performance log collection and transformation with a focus on data pipelines. Vector uses Rust for performance and memory safety. The topology-based configuration model describes data flow through sources, transforms, and sinks. Vector handles logs, metrics, and traces in a unified pipeline.

Papertrail offers hosted log aggregation with minimal setup requirements. Applications send logs via syslog protocol, and Papertrail provides search, live tail, and alerting. The hosted service eliminates infrastructure management but limits customization and data retention policies.

CloudWatch Logs integrates with AWS services for centralized logging in cloud environments. Applications and AWS resources send logs to CloudWatch, where Log Insights provides SQL-like queries. Integration with Lambda, SNS, and CloudWatch Alarms enables automated responses to log patterns. Costs increase with log volume and retention period.

Integration & Interoperability

Logging aggregation systems integrate with application frameworks, container orchestrators, cloud platforms, and monitoring tools.

Container orchestration requires special handling for ephemeral containers and dynamic scheduling. Kubernetes logs from container stdout/stderr appear in the container runtime logs. Log collection strategies include node-level agents (DaemonSet), sidecar containers per pod, or direct application shipping.

# Kubernetes pod with Fluentd sidecar
# apiVersion: v1
# kind: Pod
# metadata:
#   name: app-with-logging
# spec:
#   containers:
#   - name: app
#     image: myapp:latest
#     volumeMounts:
#     - name: logs
#       mountPath: /var/log/app
#   - name: fluentd
#     image: fluent/fluentd:latest
#     volumeMounts:
#     - name: logs
#       mountPath: /var/log/app
#     - name: fluentd-config
#       mountPath: /fluentd/etc
#   volumes:
#   - name: logs
#     emptyDir: {}
#   - name: fluentd-config
#     configMap:
#       name: fluentd-config

Distributed tracing systems like Jaeger or Zipkin correlate with logs through shared trace and span identifiers. Applications inject trace context into log entries, enabling navigation between traces and logs for the same request.

require 'opentelemetry-sdk'
require 'opentelemetry-instrumentation-rails'

class TracedLogger
  def self.log(message, metadata = {})
    span = OpenTelemetry::Trace.current_span
    trace_id = span.context.hex_trace_id
    span_id = span.context.hex_span_id
    
    entry = {
      message: message,
      trace_id: trace_id,
      span_id: span_id,
      timestamp: Time.now.utc.iso8601
    }.merge(metadata)
    
    puts entry.to_json
  end
end

# In Kibana/Grafana, link from trace_id to corresponding logs

Metrics systems complement logs with quantitative time-series data. Prometheus scrapes metrics endpoints while logs capture event details. Combined, metrics identify when problems occur and logs explain why. Exporters convert log patterns into metrics—counting error rates, request latencies, or business events.

Alerting integration connects log patterns to notification systems. Alert rules match log queries to trigger PagerDuty, Slack, email, or webhook notifications. Throttling and aggregation prevent alert storms. Alert context includes log snippets and dashboard links for investigation.

# Elasticsearch alerting rule (Watcher)
# PUT _watcher/watch/error_rate_alert
# {
#   "trigger": {
#     "schedule": { "interval": "1m" }
#   },
#   "input": {
#     "search": {
#       "request": {
#         "indices": ["app-logs-*"],
#         "body": {
#           "query": {
#             "bool": {
#               "must": [
#                 { "match": { "level": "ERROR" } },
#                 { "range": { "@timestamp": { "gte": "now-5m" } } }
#               ]
#             }
#           }
#         }
#       }
#     }
#   },
#   "condition": {
#     "compare": { "ctx.payload.hits.total": { "gt": 50 } }
#   },
#   "actions": {
#     "notify_team": {
#       "webhook": {
#         "url": "https://hooks.slack.com/services/YOUR/WEBHOOK/URL",
#         "body": "Error rate exceeded threshold: {{ctx.payload.hits.total}} errors in 5 minutes"
#       }
#     }
#   }
# }

Data pipeline integration exports logs to data warehouses for long-term analysis and business intelligence. ETL processes transform logs into structured tables for SQL analysis. Object storage (S3, GCS) provides cost-effective archival with lifecycle policies.

SIEM integration feeds logs into security information and event management systems for threat detection and compliance. Normalized log formats facilitate correlation across different log sources. Audit logs require tamper-proof storage and retention policies.

Design Considerations

Selecting a logging aggregation strategy involves analyzing reliability requirements, data volume, operational complexity, and cost constraints.

Reliability tier classification divides logs into categories with different guarantees. Audit logs for financial transactions or security events require at-least-once delivery with durable storage. Debug logs accept best-effort delivery to reduce infrastructure costs. The classification determines buffering strategies, retry policies, and storage backends.

High-reliability logs use persistent buffers, synchronous acknowledgments, and redundant storage. The increased latency and resource consumption trade off against data integrity guarantees. Low-priority logs use asynchronous transmission with memory-only buffers and shorter retention periods.

Sampling strategies reduce data volume while preserving statistical validity. Systematic sampling retains every nth log entry. Random sampling preserves randomness but complicates volume-based analysis. Stratified sampling ensures representation across different log levels or services.

Head-based sampling decides at log generation time, filtering before transmission. Tail-based sampling examines complete request traces and samples based on outcome—keeping error traces while sampling successful requests. Adaptive sampling adjusts rates based on current error rates or system load.

Schema evolution handles changing log formats over time. Strict schemas reject logs that don't match current definitions, preventing corrupted data but breaking during rollouts. Schema-on-read stores logs as-is and interprets fields during queries, accepting any format but requiring query-time field validation.

Versioned schemas embed version identifiers in log entries. Parsers handle multiple versions simultaneously, supporting gradual migrations. Schema registries centralize format definitions and version management.

Data retention policies balance query performance, storage costs, and compliance requirements. Hot storage keeps recent logs on fast indexes for interactive queries. Warm storage moves older logs to cheaper storage with slower query performance. Cold storage archives logs to object storage with query penalties measured in minutes.

Time-based retention deletes logs after a fixed period. Capacity-based retention maintains a maximum data volume, deleting oldest entries when limits approach. Compliance requirements may mandate minimum retention periods or specific deletion procedures.

Multi-tenancy isolation separates logs from different customers or teams. Index-per-tenant creates separate Elasticsearch indexes for each tenant, providing strong isolation but multiplying infrastructure overhead. Shared indexes with tenant ID fields reduce operational complexity but require query filtering and careful permission management.

Tenant-specific data paths route logs to separate storage backends or clusters. The approach provides maximum isolation and independent scaling but increases operational complexity.

Query optimization reduces costs and improves response times. Pre-aggregation computes common metrics at ingestion time rather than query time. Materialized views store query results for frequent access patterns. Time-based partitioning limits query scopes to relevant time ranges.

# Log sampling implementation
class SamplingLogger
  def initialize(sample_rate: 0.1, always_log_errors: true)
    @sample_rate = sample_rate
    @always_log_errors = always_log_errors
  end
  
  def log(level, message, metadata = {})
    should_log = @always_log_errors && level == :error ||
                 rand < @sample_rate
    
    return unless should_log
    
    entry = {
      timestamp: Time.now.utc.iso8601,
      level: level.to_s.upcase,
      message: message,
      sampled: !@always_log_errors || level != :error
    }.merge(metadata)
    
    puts entry.to_json
  end
end

logger = SamplingLogger.new(sample_rate: 0.1)
# Only 10% of info logs recorded, 100% of errors
logger.log(:info, 'Request processed', request_id: 'abc')
logger.log(:error, 'Database connection failed', error: 'timeout')

Real-World Applications

Production logging aggregation deployments demonstrate patterns for scale, reliability, and operational efficiency.

Microservices correlation tracks requests across dozens of services. Each service generates logs with a shared request ID propagated through HTTP headers. The aggregation system indexes logs by request ID, enabling queries that reconstruct the complete request flow. Trace context includes parent span IDs for hierarchical visualization.

High-cardinality fields like request ID require careful indexing strategy. Some systems index only time ranges and service names, using full-text search to filter specific request IDs. Others create secondary indexes on request IDs with time-based partitioning to limit index size.

Multi-region aggregation handles geographically distributed infrastructure. Regional log collectors aggregate locally before forwarding to central storage. This reduces WAN bandwidth costs and provides regional backup during network partitions. Data sovereignty requirements may prohibit cross-border log transmission, requiring regional storage with federated search.

# Regional aggregation configuration
class RegionalLogger
  def initialize(region:, local_aggregator:, central_aggregator:)
    @region = region
    @local = FluentLogger.new(host: local_aggregator)
    @central = FluentLogger.new(host: central_aggregator)
  end
  
  def log(message, metadata = {})
    enriched = metadata.merge(
      region: @region,
      timestamp: Time.now.utc.iso8601
    )
    
    # Send to local for low-latency access
    @local.post('app.logs', enriched)
    
    # Send critical logs to central
    if metadata[:level] == 'ERROR' || metadata[:audit]
      @central.post('app.critical', enriched)
    end
  rescue => e
    # Fallback to local only
    @local.post('app.logs', enriched)
  end
end

Compliance and audit logging implements immutable log storage for regulatory requirements. Write-once storage prevents log tampering. Cryptographic signatures verify log integrity. Access controls restrict log viewing to authorized personnel. Audit logs capture who accessed which data when.

Tamper-proof logging appends cryptographic hashes linking each log entry to previous entries, creating a chain that reveals modifications. Periodic signatures from trusted timestamping authorities prove log creation times.

High-volume event streaming processes millions of log entries per second. Kafka topics partition logs for parallel processing. Stream processors filter, enrich, and aggregate logs before storage. Exactly-once processing semantics prevent duplicate log entries during failures and retries.

Back-pressure mechanisms throttle log producers when consumers cannot keep pace. Circuit breakers fail fast when downstream systems become unavailable rather than accumulating unbounded buffers.

Cost optimization reduces infrastructure spending while maintaining functionality. Compression reduces storage by 80-90% for text logs. Lifecycle policies automatically delete or archive old logs. Sampling discards low-value logs at collection time. Index field selection limits which fields support fast queries.

Reserved instance pricing for stable log volumes and spot instances for batch processing reduce cloud costs. Object storage for cold logs costs 90% less than indexed storage.

Reference

Common Log Levels

Level Purpose Retention
TRACE Detailed execution flow Short (hours-days)
DEBUG Diagnostic information Short (days)
INFO Normal operation events Medium (weeks)
WARN Potential issues Medium (weeks-months)
ERROR Failure events Long (months)
FATAL Critical failures Long (months-years)

Structured Log Fields

Field Type Purpose Example
timestamp ISO8601 string Event time 2025-10-10T14:30:00.123Z
level String Severity ERROR
message String Human description User login failed
service String Source service api-server
request_id UUID Request correlation 550e8400-e29b-41d4-a716-446655440000
user_id Integer User context 12345
duration_ms Float Operation time 45.67
error String Error details Connection timeout

Collection Methods Comparison

Method Latency Reliability Resource Use Complexity
Direct HTTP Low (ms) Medium Low Low
File + Agent Medium (seconds) High Medium Medium
Message Queue Medium (seconds) High High High
Syslog Low (ms) Medium Low Low
Sidecar Low (ms) High Medium Medium

Popular Tools Feature Matrix

Tool Language Storage Query Language License Best For
Elasticsearch Java Self Query DSL, SQL Apache Full-text search
Loki Go Self LogQL AGPL Label-based queries
Splunk C++ Proprietary SPL Commercial Enterprise analytics
Graylog Java Elasticsearch Graylog query GPL/Commercial SIEM integration
Fluentd Ruby/C Multiple N/A Apache Unified collection

Index Strategy Guidelines

Log Volume Strategy Retention Cost
< 1 GB/day Single index 30-90 days Low
1-10 GB/day Daily indices 7-30 days hot, 90 days warm Medium
10-100 GB/day Hourly indices 1-7 days hot, 30 days warm, 90 days cold High
> 100 GB/day Partitioned indices Aggressive sampling and archival Very High

Buffering Configuration

Buffer Type Durability Performance Use Case
Memory Lost on crash Highest Non-critical logs
File Survives restart Medium Production logs
Queue (disk) Replicated Lower Critical audit logs

Query Performance Factors

Factor Impact Mitigation
Time range Exponential Limit query windows
Field cardinality High Index only low-cardinality fields
Wildcard searches Very high Use prefix queries
Aggregations High Pre-compute common aggregations
Full-text search Medium Use field filters first

Ruby Logging Libraries

Library Features Performance Complexity
Logger (stdlib) Basic logging High Low
Semantic Logger Structured, multiple appenders Medium Medium
Lograge Rails request logging High Low
LogStashLogger Direct Logstash integration Medium Low
Ougai JSON structured logging High Low