Overview
Logging aggregation collects log data from multiple sources across distributed systems and consolidates this data into a centralized location for analysis, monitoring, and troubleshooting. Applications, servers, containers, and infrastructure components generate logs independently, creating fragmented data scattered across the system. Log aggregation solves the operational challenge of accessing and correlating this dispersed information.
The practice emerged from the complexity of managing multi-server environments where debugging required SSH access to individual machines to grep through local log files. As systems scaled horizontally, this approach became impractical. Logging aggregation transforms the operational model from distributed file access to centralized data streams.
Modern logging aggregation handles structured and unstructured log formats, processes high-volume data streams, supports real-time analysis, and integrates with alerting and visualization systems. The architecture typically includes log shippers (agents that collect logs), transport mechanisms (message queues or direct transmission), aggregation servers (centralized collection points), storage backends (databases or file systems), and analysis interfaces (search, dashboards, alerting).
# Basic structured logging in Ruby application
require 'logger'
require 'json'
class StructuredLogger
def initialize(output = $stdout)
@logger = Logger.new(output)
@logger.formatter = proc do |severity, datetime, progname, msg|
{
timestamp: datetime.iso8601,
level: severity,
message: msg,
service: 'api-server',
environment: ENV['RACK_ENV']
}.to_json + "\n"
end
end
def info(message, metadata = {})
@logger.info(message.merge(metadata))
end
end
logger = StructuredLogger.new
logger.info({ event: 'user_login', user_id: 12345, ip: '192.168.1.1' })
# => {"timestamp":"2025-10-10T10:30:00Z","level":"INFO","message":{"event":"user_login","user_id":12345,"ip":"192.168.1.1"},"service":"api-server","environment":"production"}
Key Principles
Logging aggregation operates on several foundational principles that determine system design and effectiveness.
Centralization consolidates logs from distributed sources into unified storage. Each application server, database, load balancer, and infrastructure component sends logs to a central aggregation point rather than maintaining local files. Centralization eliminates the need to access individual machines for troubleshooting and creates a single source of truth for system behavior.
Structured data replaces free-form text logs with parseable formats. JSON, key-value pairs, or other structured formats enable field-level indexing and querying. Structured logs support filtering by specific fields like user ID, request ID, or error type without relying on string pattern matching. The structure must balance human readability with machine parseability.
Correlation links related log entries across services through shared identifiers. Request IDs, trace IDs, or session identifiers connect log entries generated by different components handling the same user request. Distributed tracing relies on correlation to reconstruct the complete path of a request through a microservices architecture.
Buffering and reliability handle network interruptions and downstream system failures. Log shippers buffer messages locally when the aggregation server becomes unavailable, preventing log loss during temporary outages. Back-pressure mechanisms prevent memory exhaustion when buffer limits approach. Reliability requirements vary based on log importance—audit logs demand higher guarantees than debug logs.
Scalability accommodates growing log volumes without degrading system performance. Horizontal scaling adds more aggregation servers and storage nodes. Partitioning distributes logs across multiple storage backends based on time ranges, service names, or other attributes. Compression reduces storage requirements and network bandwidth.
Query performance enables rapid search across large log datasets. Indexing strategies determine query speed and storage costs. Full-text search indexes accelerate string pattern matching but increase storage overhead. Field-specific indexes optimize filtering on commonly queried attributes.
# Log correlation with request ID
require 'securerandom'
class RequestLogger
def self.with_request_id
request_id = SecureRandom.uuid
Thread.current[:request_id] = request_id
yield
ensure
Thread.current[:request_id] = nil
end
def self.log(message, metadata = {})
entry = {
timestamp: Time.now.utc.iso8601,
request_id: Thread.current[:request_id],
message: message
}.merge(metadata)
puts entry.to_json
end
end
RequestLogger.with_request_id do
RequestLogger.log('Request received', path: '/api/users')
RequestLogger.log('Database query executed', query: 'SELECT * FROM users', duration_ms: 45)
RequestLogger.log('Response sent', status: 200)
end
# All three log entries share the same request_id
Implementation Approaches
Different architectures address varying requirements for reliability, performance, and operational complexity.
Direct shipping sends logs from applications directly to the aggregation server without intermediate components. Applications include a logging library that transmits log entries via HTTP, TCP, or UDP to the aggregation endpoint. This approach minimizes infrastructure complexity but couples application performance to aggregation system availability.
Direct shipping works for small deployments with reliable networks. Network latency directly impacts application response times when using synchronous transmission. Asynchronous transmission with local buffering mitigates this coupling but introduces complexity in the application code. Connection pooling and keep-alive reduce overhead from establishing connections for each log entry.
Agent-based collection deploys log shipping agents on each host that read application log files and forward entries to aggregation servers. Agents decouple application code from log transmission infrastructure. Applications write to local files or stdout, and the agent handles collection, parsing, buffering, and transmission.
Agents support multiple input sources on a single host, aggregate logs from multiple applications, and provide centralized configuration for log routing. File-based collection introduces lag between log generation and availability in the aggregation system. Agents consume system resources—CPU for parsing, memory for buffering, and disk I/O for reading files.
# Application writes to stdout, agent collects
class StdoutLogger
def self.info(message, metadata = {})
entry = {
timestamp: Time.now.utc.iso8601,
level: 'INFO',
service: ENV['SERVICE_NAME'],
message: message
}.merge(metadata)
puts entry.to_json
end
end
# Agent configuration (Fluentd example in Ruby)
# <source>
# @type tail
# path /var/log/app/*.log
# pos_file /var/log/td-agent/app.pos
# tag app.logs
# <parse>
# @type json
# time_key timestamp
# time_format %Y-%m-%dT%H:%M:%S.%LZ
# </parse>
# </source>
Message queue buffering introduces a message broker between log sources and aggregation servers. Applications or agents publish logs to a queue (Kafka, RabbitMQ, Redis), and aggregation servers consume from the queue. The queue provides buffering, load distribution, and replay capabilities.
Message queues decouple log producers from consumers, allowing independent scaling. Queue depth metrics indicate system health and processing lag. Persistent queues survive broker restarts but require disk I/O. In-memory queues offer higher throughput but risk data loss during failures. Topic partitioning distributes load across multiple consumers and enables parallel processing.
Sidecar containers deploy log collection containers alongside application containers in orchestrated environments. Each pod or task includes both the application container and a log shipping sidecar. The sidecar reads application logs from a shared volume or stdout and forwards entries to the aggregation system.
Sidecars isolate logging infrastructure from application code and configuration. The pattern works well in Kubernetes where pod-level networking and volume sharing simplify communication. Resource allocation for sidecars must account for log volume and processing overhead. Sidecar updates deploy independently from application updates.
Gateway aggregation consolidates logs at an intermediate tier before final aggregation. Regional or cluster-level gateways collect logs from local sources, perform filtering or enrichment, and forward to central storage. Multi-tier architectures reduce network traffic across WAN links and provide regional data isolation for compliance requirements.
Gateway tiers introduce additional failure points and latency. Gateway configuration complexity increases with the number of tiers. Benefits include bandwidth reduction through log sampling or filtering, data sovereignty compliance by keeping logs in specific regions, and reduced load on central aggregation infrastructure.
Ruby Implementation
Ruby applications integrate with logging aggregation through structured logging libraries, log shipping clients, and framework-specific instrumentation.
Semantic Logger provides structured logging with multiple appender support including direct HTTP shipping, file output for agent collection, and syslog transmission.
require 'semantic_logger'
# Configure multiple appenders
SemanticLogger.default_level = :info
SemanticLogger.add_appender(file_name: 'log/application.log', formatter: :json)
SemanticLogger.add_appender(
appender: :http,
url: 'http://logaggregator.example.com:8080/logs',
header: { 'Authorization' => 'Bearer token123' }
)
class UserService
include SemanticLogger::Loggable
def create_user(params)
logger.info('Creating user', params.slice(:email, :name))
user = User.create(params)
logger.info('User created', user_id: user.id, duration: measure)
user
rescue => e
logger.error('User creation failed', exception: e, params: params)
raise
end
def measure
start = Time.now
yield if block_given?
((Time.now - start) * 1000).round(2)
end
end
Rails integration extends the default logger to output structured JSON and add request context to all log entries within a request lifecycle.
# config/environments/production.rb
Rails.application.configure do
config.log_formatter = proc do |severity, time, progname, msg|
{
timestamp: time.utc.iso8601,
severity: severity,
progname: progname,
message: msg,
environment: Rails.env,
host: Socket.gethostname
}.to_json + "\n"
end
config.logger = ActiveSupport::Logger.new($stdout)
end
# lib/log_request_context.rb
class LogRequestContext
def initialize(app)
@app = app
end
def call(env)
request_id = env['HTTP_X_REQUEST_ID'] || SecureRandom.uuid
RequestStore.store[:request_id] = request_id
RequestStore.store[:user_id] = extract_user_id(env)
@app.call(env)
ensure
RequestStore.clear!
end
private
def extract_user_id(env)
# Extract from JWT token, session, etc.
end
end
# Override logger to inject request context
module RequestContextLogger
def add(severity, message = nil, progname = nil, &block)
if message.is_a?(Hash)
message = message.merge(
request_id: RequestStore.store[:request_id],
user_id: RequestStore.store[:user_id]
)
end
super
end
end
ActiveSupport::Logger.prepend(RequestContextLogger)
Fluentd Ruby SDK sends logs directly to Fluentd servers from application code, eliminating the need for file-based collection.
require 'fluent-logger'
class FluentLogger
def initialize
@logger = Fluent::Logger::FluentLogger.new(
'app',
host: ENV['FLUENTD_HOST'] || 'localhost',
port: ENV['FLUENTD_PORT']&.to_i || 24224
)
end
def log(tag, data)
enriched_data = data.merge(
timestamp: Time.now.to_i,
hostname: Socket.gethostname,
pid: Process.pid,
thread_id: Thread.current.object_id
)
@logger.post(tag, enriched_data)
rescue => e
# Fallback to stderr if Fluentd unavailable
warn "Fluentd logging failed: #{e.message}"
warn enriched_data.to_json
end
end
fluent = FluentLogger.new
fluent.log('user.action', { action: 'login', user_id: 123, success: true })
Logstash JSON formatter prepares logs in the format expected by Logstash pipelines.
require 'logstash-logger'
logger = LogStashLogger.new(
type: :tcp,
host: 'logstash.example.com',
port: 5228,
ssl_enable: true
)
logger.info(
event: 'payment_processed',
amount: 99.99,
currency: 'USD',
user_id: 456,
transaction_id: 'txn_abc123'
)
Custom batching implementation accumulates log entries and sends in batches to reduce network overhead.
require 'net/http'
require 'json'
class BatchLogger
def initialize(endpoint, batch_size: 100, flush_interval: 5)
@endpoint = URI(endpoint)
@batch = []
@mutex = Mutex.new
@batch_size = batch_size
@flush_interval = flush_interval
start_flush_timer
end
def log(entry)
@mutex.synchronize do
@batch << entry.merge(timestamp: Time.now.utc.iso8601)
flush if @batch.size >= @batch_size
end
end
def flush
return if @batch.empty?
batch_copy = nil
@mutex.synchronize do
batch_copy = @batch.dup
@batch.clear
end
send_batch(batch_copy)
end
private
def send_batch(batch)
http = Net::HTTP.new(@endpoint.host, @endpoint.port)
http.use_ssl = @endpoint.scheme == 'https'
request = Net::HTTP::Post.new(@endpoint.path)
request['Content-Type'] = 'application/json'
request.body = { logs: batch }.to_json
http.request(request)
rescue => e
warn "Failed to send log batch: #{e.message}"
end
def start_flush_timer
Thread.new do
loop do
sleep @flush_interval
flush
end
end
end
end
Tools & Ecosystem
The logging aggregation landscape includes open-source and commercial solutions with different strengths and operational characteristics.
ELK Stack (Elasticsearch, Logstash, Kibana) combines log ingestion, indexing, storage, and visualization. Logstash collects and parses logs from multiple sources, Elasticsearch provides full-text search and analytics, and Kibana offers dashboards and querying interfaces. The stack handles high log volumes and supports complex querying. Elasticsearch cluster management requires expertise, and resource consumption grows with data volume and retention periods.
Fluentd provides flexible log collection with a plugin architecture supporting hundreds of input sources, filters, and output destinations. Written in Ruby with performance-critical parts in C, Fluentd handles structured and unstructured logs. The unified logging layer pattern uses Fluentd as a central collection point that routes logs to multiple backends. Memory buffering and file-based persistent queuing ensure reliability. Configuration complexity increases with the number of sources and routing rules.
Fluent Bit offers a lightweight alternative to Fluentd with lower memory footprint and CPU usage. Designed for embedded systems and edge computing, Fluent Bit works well as a sidecar container or IoT device agent. The reduced feature set focuses on core collection and forwarding capabilities. Fluent Bit often forwards to Fluentd aggregators for additional processing and routing.
Splunk provides enterprise log management with advanced analytics, machine learning anomaly detection, and compliance reporting. The commercial platform handles massive data volumes with distributed indexing and search. Licensing costs scale with data ingestion volume. Splunk's proprietary query language (SPL) supports complex correlations and statistical analysis.
Graylog combines log management with security information and event management (SIEM) capabilities. Built on Elasticsearch, MongoDB, and a Java server component, Graylog offers role-based access control, stream processing, and alerting. The open-source version provides core functionality while the enterprise edition adds audit logging and additional data sources.
Loki from Grafana Labs takes a different approach by indexing labels rather than full-text content. Logs store as compressed chunks referenced by label indexes, dramatically reducing storage and indexing costs. Query performance depends on label cardinality—low cardinality label sets perform well while high cardinality degrades performance. Loki integrates with Prometheus metrics and Grafana dashboards.
# Fluentd configuration for Ruby app collection
# fluent.conf
# <source>
# @type forward
# port 24224
# bind 0.0.0.0
# </source>
#
# <filter app.**>
# @type record_transformer
# <record>
# cluster "production-us-east"
# datacenter "us-east-1a"
# </record>
# </filter>
#
# <match app.error>
# @type elasticsearch
# host elasticsearch.example.com
# port 9200
# index_name app-errors
# type_name _doc
# </match>
#
# <match app.**>
# @type s3
# aws_key_id YOUR_AWS_KEY
# aws_sec_key YOUR_AWS_SECRET
# s3_bucket app-logs-archive
# s3_region us-east-1
# path logs/%Y/%m/%d/
# time_slice_format %Y%m%d
#
# <buffer>
# @type file
# path /var/log/fluentd/s3
# flush_interval 5m
# </buffer>
# </match>
Vector from Datadog provides high-performance log collection and transformation with a focus on data pipelines. Vector uses Rust for performance and memory safety. The topology-based configuration model describes data flow through sources, transforms, and sinks. Vector handles logs, metrics, and traces in a unified pipeline.
Papertrail offers hosted log aggregation with minimal setup requirements. Applications send logs via syslog protocol, and Papertrail provides search, live tail, and alerting. The hosted service eliminates infrastructure management but limits customization and data retention policies.
CloudWatch Logs integrates with AWS services for centralized logging in cloud environments. Applications and AWS resources send logs to CloudWatch, where Log Insights provides SQL-like queries. Integration with Lambda, SNS, and CloudWatch Alarms enables automated responses to log patterns. Costs increase with log volume and retention period.
Integration & Interoperability
Logging aggregation systems integrate with application frameworks, container orchestrators, cloud platforms, and monitoring tools.
Container orchestration requires special handling for ephemeral containers and dynamic scheduling. Kubernetes logs from container stdout/stderr appear in the container runtime logs. Log collection strategies include node-level agents (DaemonSet), sidecar containers per pod, or direct application shipping.
# Kubernetes pod with Fluentd sidecar
# apiVersion: v1
# kind: Pod
# metadata:
# name: app-with-logging
# spec:
# containers:
# - name: app
# image: myapp:latest
# volumeMounts:
# - name: logs
# mountPath: /var/log/app
# - name: fluentd
# image: fluent/fluentd:latest
# volumeMounts:
# - name: logs
# mountPath: /var/log/app
# - name: fluentd-config
# mountPath: /fluentd/etc
# volumes:
# - name: logs
# emptyDir: {}
# - name: fluentd-config
# configMap:
# name: fluentd-config
Distributed tracing systems like Jaeger or Zipkin correlate with logs through shared trace and span identifiers. Applications inject trace context into log entries, enabling navigation between traces and logs for the same request.
require 'opentelemetry-sdk'
require 'opentelemetry-instrumentation-rails'
class TracedLogger
def self.log(message, metadata = {})
span = OpenTelemetry::Trace.current_span
trace_id = span.context.hex_trace_id
span_id = span.context.hex_span_id
entry = {
message: message,
trace_id: trace_id,
span_id: span_id,
timestamp: Time.now.utc.iso8601
}.merge(metadata)
puts entry.to_json
end
end
# In Kibana/Grafana, link from trace_id to corresponding logs
Metrics systems complement logs with quantitative time-series data. Prometheus scrapes metrics endpoints while logs capture event details. Combined, metrics identify when problems occur and logs explain why. Exporters convert log patterns into metrics—counting error rates, request latencies, or business events.
Alerting integration connects log patterns to notification systems. Alert rules match log queries to trigger PagerDuty, Slack, email, or webhook notifications. Throttling and aggregation prevent alert storms. Alert context includes log snippets and dashboard links for investigation.
# Elasticsearch alerting rule (Watcher)
# PUT _watcher/watch/error_rate_alert
# {
# "trigger": {
# "schedule": { "interval": "1m" }
# },
# "input": {
# "search": {
# "request": {
# "indices": ["app-logs-*"],
# "body": {
# "query": {
# "bool": {
# "must": [
# { "match": { "level": "ERROR" } },
# { "range": { "@timestamp": { "gte": "now-5m" } } }
# ]
# }
# }
# }
# }
# }
# },
# "condition": {
# "compare": { "ctx.payload.hits.total": { "gt": 50 } }
# },
# "actions": {
# "notify_team": {
# "webhook": {
# "url": "https://hooks.slack.com/services/YOUR/WEBHOOK/URL",
# "body": "Error rate exceeded threshold: {{ctx.payload.hits.total}} errors in 5 minutes"
# }
# }
# }
# }
Data pipeline integration exports logs to data warehouses for long-term analysis and business intelligence. ETL processes transform logs into structured tables for SQL analysis. Object storage (S3, GCS) provides cost-effective archival with lifecycle policies.
SIEM integration feeds logs into security information and event management systems for threat detection and compliance. Normalized log formats facilitate correlation across different log sources. Audit logs require tamper-proof storage and retention policies.
Design Considerations
Selecting a logging aggregation strategy involves analyzing reliability requirements, data volume, operational complexity, and cost constraints.
Reliability tier classification divides logs into categories with different guarantees. Audit logs for financial transactions or security events require at-least-once delivery with durable storage. Debug logs accept best-effort delivery to reduce infrastructure costs. The classification determines buffering strategies, retry policies, and storage backends.
High-reliability logs use persistent buffers, synchronous acknowledgments, and redundant storage. The increased latency and resource consumption trade off against data integrity guarantees. Low-priority logs use asynchronous transmission with memory-only buffers and shorter retention periods.
Sampling strategies reduce data volume while preserving statistical validity. Systematic sampling retains every nth log entry. Random sampling preserves randomness but complicates volume-based analysis. Stratified sampling ensures representation across different log levels or services.
Head-based sampling decides at log generation time, filtering before transmission. Tail-based sampling examines complete request traces and samples based on outcome—keeping error traces while sampling successful requests. Adaptive sampling adjusts rates based on current error rates or system load.
Schema evolution handles changing log formats over time. Strict schemas reject logs that don't match current definitions, preventing corrupted data but breaking during rollouts. Schema-on-read stores logs as-is and interprets fields during queries, accepting any format but requiring query-time field validation.
Versioned schemas embed version identifiers in log entries. Parsers handle multiple versions simultaneously, supporting gradual migrations. Schema registries centralize format definitions and version management.
Data retention policies balance query performance, storage costs, and compliance requirements. Hot storage keeps recent logs on fast indexes for interactive queries. Warm storage moves older logs to cheaper storage with slower query performance. Cold storage archives logs to object storage with query penalties measured in minutes.
Time-based retention deletes logs after a fixed period. Capacity-based retention maintains a maximum data volume, deleting oldest entries when limits approach. Compliance requirements may mandate minimum retention periods or specific deletion procedures.
Multi-tenancy isolation separates logs from different customers or teams. Index-per-tenant creates separate Elasticsearch indexes for each tenant, providing strong isolation but multiplying infrastructure overhead. Shared indexes with tenant ID fields reduce operational complexity but require query filtering and careful permission management.
Tenant-specific data paths route logs to separate storage backends or clusters. The approach provides maximum isolation and independent scaling but increases operational complexity.
Query optimization reduces costs and improves response times. Pre-aggregation computes common metrics at ingestion time rather than query time. Materialized views store query results for frequent access patterns. Time-based partitioning limits query scopes to relevant time ranges.
# Log sampling implementation
class SamplingLogger
def initialize(sample_rate: 0.1, always_log_errors: true)
@sample_rate = sample_rate
@always_log_errors = always_log_errors
end
def log(level, message, metadata = {})
should_log = @always_log_errors && level == :error ||
rand < @sample_rate
return unless should_log
entry = {
timestamp: Time.now.utc.iso8601,
level: level.to_s.upcase,
message: message,
sampled: !@always_log_errors || level != :error
}.merge(metadata)
puts entry.to_json
end
end
logger = SamplingLogger.new(sample_rate: 0.1)
# Only 10% of info logs recorded, 100% of errors
logger.log(:info, 'Request processed', request_id: 'abc')
logger.log(:error, 'Database connection failed', error: 'timeout')
Real-World Applications
Production logging aggregation deployments demonstrate patterns for scale, reliability, and operational efficiency.
Microservices correlation tracks requests across dozens of services. Each service generates logs with a shared request ID propagated through HTTP headers. The aggregation system indexes logs by request ID, enabling queries that reconstruct the complete request flow. Trace context includes parent span IDs for hierarchical visualization.
High-cardinality fields like request ID require careful indexing strategy. Some systems index only time ranges and service names, using full-text search to filter specific request IDs. Others create secondary indexes on request IDs with time-based partitioning to limit index size.
Multi-region aggregation handles geographically distributed infrastructure. Regional log collectors aggregate locally before forwarding to central storage. This reduces WAN bandwidth costs and provides regional backup during network partitions. Data sovereignty requirements may prohibit cross-border log transmission, requiring regional storage with federated search.
# Regional aggregation configuration
class RegionalLogger
def initialize(region:, local_aggregator:, central_aggregator:)
@region = region
@local = FluentLogger.new(host: local_aggregator)
@central = FluentLogger.new(host: central_aggregator)
end
def log(message, metadata = {})
enriched = metadata.merge(
region: @region,
timestamp: Time.now.utc.iso8601
)
# Send to local for low-latency access
@local.post('app.logs', enriched)
# Send critical logs to central
if metadata[:level] == 'ERROR' || metadata[:audit]
@central.post('app.critical', enriched)
end
rescue => e
# Fallback to local only
@local.post('app.logs', enriched)
end
end
Compliance and audit logging implements immutable log storage for regulatory requirements. Write-once storage prevents log tampering. Cryptographic signatures verify log integrity. Access controls restrict log viewing to authorized personnel. Audit logs capture who accessed which data when.
Tamper-proof logging appends cryptographic hashes linking each log entry to previous entries, creating a chain that reveals modifications. Periodic signatures from trusted timestamping authorities prove log creation times.
High-volume event streaming processes millions of log entries per second. Kafka topics partition logs for parallel processing. Stream processors filter, enrich, and aggregate logs before storage. Exactly-once processing semantics prevent duplicate log entries during failures and retries.
Back-pressure mechanisms throttle log producers when consumers cannot keep pace. Circuit breakers fail fast when downstream systems become unavailable rather than accumulating unbounded buffers.
Cost optimization reduces infrastructure spending while maintaining functionality. Compression reduces storage by 80-90% for text logs. Lifecycle policies automatically delete or archive old logs. Sampling discards low-value logs at collection time. Index field selection limits which fields support fast queries.
Reserved instance pricing for stable log volumes and spot instances for batch processing reduce cloud costs. Object storage for cold logs costs 90% less than indexed storage.
Reference
Common Log Levels
| Level | Purpose | Retention |
|---|---|---|
| TRACE | Detailed execution flow | Short (hours-days) |
| DEBUG | Diagnostic information | Short (days) |
| INFO | Normal operation events | Medium (weeks) |
| WARN | Potential issues | Medium (weeks-months) |
| ERROR | Failure events | Long (months) |
| FATAL | Critical failures | Long (months-years) |
Structured Log Fields
| Field | Type | Purpose | Example |
|---|---|---|---|
| timestamp | ISO8601 string | Event time | 2025-10-10T14:30:00.123Z |
| level | String | Severity | ERROR |
| message | String | Human description | User login failed |
| service | String | Source service | api-server |
| request_id | UUID | Request correlation | 550e8400-e29b-41d4-a716-446655440000 |
| user_id | Integer | User context | 12345 |
| duration_ms | Float | Operation time | 45.67 |
| error | String | Error details | Connection timeout |
Collection Methods Comparison
| Method | Latency | Reliability | Resource Use | Complexity |
|---|---|---|---|---|
| Direct HTTP | Low (ms) | Medium | Low | Low |
| File + Agent | Medium (seconds) | High | Medium | Medium |
| Message Queue | Medium (seconds) | High | High | High |
| Syslog | Low (ms) | Medium | Low | Low |
| Sidecar | Low (ms) | High | Medium | Medium |
Popular Tools Feature Matrix
| Tool | Language | Storage | Query Language | License | Best For |
|---|---|---|---|---|---|
| Elasticsearch | Java | Self | Query DSL, SQL | Apache | Full-text search |
| Loki | Go | Self | LogQL | AGPL | Label-based queries |
| Splunk | C++ | Proprietary | SPL | Commercial | Enterprise analytics |
| Graylog | Java | Elasticsearch | Graylog query | GPL/Commercial | SIEM integration |
| Fluentd | Ruby/C | Multiple | N/A | Apache | Unified collection |
Index Strategy Guidelines
| Log Volume | Strategy | Retention | Cost |
|---|---|---|---|
| < 1 GB/day | Single index | 30-90 days | Low |
| 1-10 GB/day | Daily indices | 7-30 days hot, 90 days warm | Medium |
| 10-100 GB/day | Hourly indices | 1-7 days hot, 30 days warm, 90 days cold | High |
| > 100 GB/day | Partitioned indices | Aggressive sampling and archival | Very High |
Buffering Configuration
| Buffer Type | Durability | Performance | Use Case |
|---|---|---|---|
| Memory | Lost on crash | Highest | Non-critical logs |
| File | Survives restart | Medium | Production logs |
| Queue (disk) | Replicated | Lower | Critical audit logs |
Query Performance Factors
| Factor | Impact | Mitigation |
|---|---|---|
| Time range | Exponential | Limit query windows |
| Field cardinality | High | Index only low-cardinality fields |
| Wildcard searches | Very high | Use prefix queries |
| Aggregations | High | Pre-compute common aggregations |
| Full-text search | Medium | Use field filters first |
Ruby Logging Libraries
| Library | Features | Performance | Complexity |
|---|---|---|---|
| Logger (stdlib) | Basic logging | High | Low |
| Semantic Logger | Structured, multiple appenders | Medium | Medium |
| Lograge | Rails request logging | High | Low |
| LogStashLogger | Direct Logstash integration | Medium | Low |
| Ougai | JSON structured logging | High | Low |