CrackedRuby logo

CrackedRuby

Application Monitoring

Overview

Application monitoring in Ruby encompasses tracking application performance, detecting errors, measuring resource usage, and observing system health. Ruby provides several approaches through standard library modules, third-party gems, and custom instrumentation patterns.

The Logger class from Ruby's standard library forms the foundation for basic monitoring through structured logging. Ruby applications commonly integrate with external monitoring services through gems like newrelic_rpm, scout_apm, or datadog. Custom monitoring solutions build on Ruby's TracePoint API, method instrumentation patterns, and metric collection frameworks.

require 'logger'
require 'benchmark'

# Basic monitoring setup
logger = Logger.new('application.log')
logger.level = Logger::INFO

# Simple performance tracking
execution_time = Benchmark.measure { expensive_operation() }
logger.info "Operation completed in #{execution_time.real}s"

Ruby's metaprogramming capabilities enable method-level instrumentation without modifying existing code. The Module#prepend and alias_method patterns create monitoring wrappers around business logic.

module PerformanceMonitor
  def self.prepended(base)
    base.instance_methods(false).each do |method_name|
      base.alias_method "#{method_name}_without_monitoring", method_name
      base.define_method(method_name) do |*args, &block|
        start_time = Time.now
        result = send("#{method_name}_without_monitoring", *args, &block)
        duration = Time.now - start_time
        puts "#{method_name} executed in #{duration}s"
        result
      end
    end
  end
end

Monitoring systems typically collect four types of data: metrics (numerical measurements), logs (structured events), traces (request paths), and errors (exception information). Ruby applications expose this data through instrumentation points, custom collectors, and integration with monitoring platforms.

Basic Usage

Ruby applications implement monitoring through direct instrumentation, automatic collection via gems, and custom metric gathering. The standard library provides basic tools while specialized gems offer comprehensive monitoring solutions.

require 'logger'
require 'json'
require 'time'

class ApplicationMonitor
  def initialize(logger = Logger.new(STDOUT))
    @logger = logger
    @metrics = {}
    @start_time = Time.now
  end

  def track_request(path, method, duration, status)
    @logger.info({
      timestamp: Time.now.iso8601,
      path: path,
      method: method,
      duration_ms: (duration * 1000).round(2),
      status: status,
      memory_mb: memory_usage_mb
    }.to_json)
  end

  def increment_counter(metric_name, value = 1)
    @metrics[metric_name] ||= 0
    @metrics[metric_name] += value
  end

  private

  def memory_usage_mb
    `ps -o rss= -p #{Process.pid}`.to_i / 1024.0
  end
end

# Usage in application
monitor = ApplicationMonitor.new
monitor.track_request('/api/users', 'GET', 0.143, 200)
monitor.increment_counter('user_registrations')

Method-level monitoring wraps existing functionality to collect timing and error data. Ruby's Module#prepend provides clean instrumentation without altering original method signatures.

module MethodMonitor
  def self.included(base)
    base.extend(ClassMethods)
  end

  module ClassMethods
    def monitor_method(method_name, options = {})
      original_method = instance_method(method_name)
      
      define_method(method_name) do |*args, &block|
        start_time = Time.now
        begin
          result = original_method.bind(self).call(*args, &block)
          duration = Time.now - start_time
          log_success(method_name, duration, options)
          result
        rescue => error
          duration = Time.now - start_time
          log_error(method_name, error, duration, options)
          raise
        end
      end
    end
  end

  private

  def log_success(method, duration, options)
    puts "#{method} completed successfully in #{duration}s"
  end

  def log_error(method, error, duration, options)
    puts "#{method} failed with #{error.class}: #{error.message} (#{duration}s)"
  end
end

class UserService
  include MethodMonitor
  
  def create_user(email, name)
    # User creation logic
    User.create(email: email, name: name)
  end
  
  monitor_method :create_user, alert_threshold: 2.0
end

External monitoring services integrate through initialization and automatic instrumentation. Most gems require minimal configuration while providing comprehensive data collection.

# Gemfile
gem 'newrelic_rpm'
gem 'scout_apm'

# config/newrelic.yml configuration automatically loads
# Custom instrumentation for business logic
class PaymentProcessor
  include NewRelic::Agent::Instrumentation::ControllerInstrumentation
  
  def process_payment(amount, card_token)
    # Payment processing logic
    NewRelic::Agent.record_metric('Custom/PaymentAmount', amount)
    NewRelic::Agent.increment_metric('Custom/PaymentCount')
    
    # Business logic here
    result = charge_card(amount, card_token)
    
    NewRelic::Agent.add_custom_attributes({
      payment_method: 'credit_card',
      amount_cents: amount,
      success: result.success?
    })
    
    result
  end
  
  add_transaction_tracer :process_payment, category: :task
end

Health check endpoints provide application status information for load balancers and monitoring systems. These endpoints verify database connections, external service availability, and system resource levels.

require 'sinatra'
require 'json'

class HealthCheck < Sinatra::Base
  get '/health' do
    content_type :json
    
    checks = {
      database: check_database,
      redis: check_redis,
      disk_space: check_disk_space,
      memory: check_memory_usage
    }
    
    status = checks.all? { |_, result| result[:status] == 'ok' } ? 200 : 503
    
    {
      status: status == 200 ? 'healthy' : 'unhealthy',
      timestamp: Time.now.iso8601,
      checks: checks
    }.to_json
  end

  private

  def check_database
    ActiveRecord::Base.connection.execute('SELECT 1')
    { status: 'ok', response_time_ms: 5 }
  rescue => error
    { status: 'error', message: error.message }
  end

  def check_redis
    Redis.current.ping
    { status: 'ok' }
  rescue => error
    { status: 'error', message: error.message }
  end
end

Production Patterns

Production monitoring requires structured approaches to data collection, alerting, and performance optimization. Ruby applications in production implement monitoring through multiple layers: application-level instrumentation, infrastructure monitoring, and business metric tracking.

Centralized logging aggregates data from multiple application instances and services. Production applications structure log data as JSON for parsing by log aggregation systems like ELK stack or Splunk.

require 'logger'
require 'json'

class ProductionLogger
  def initialize(service_name, environment)
    @service_name = service_name
    @environment = environment
    @logger = Logger.new(STDOUT)
    @logger.formatter = method(:json_formatter)
  end

  def info(message, metadata = {})
    log_data = base_log_data.merge(
      level: 'INFO',
      message: message,
      metadata: metadata
    )
    @logger.info(log_data)
  end

  def error(message, exception = nil, metadata = {})
    log_data = base_log_data.merge(
      level: 'ERROR',
      message: message,
      metadata: metadata
    )
    
    if exception
      log_data[:exception] = {
        class: exception.class.name,
        message: exception.message,
        backtrace: exception.backtrace&.first(10)
      }
    end
    
    @logger.error(log_data)
  end

  private

  def base_log_data
    {
      timestamp: Time.now.utc.iso8601,
      service: @service_name,
      environment: @environment,
      host: Socket.gethostname,
      pid: Process.pid,
      thread_id: Thread.current.object_id
    }
  end

  def json_formatter(severity, datetime, progname, msg)
    "#{msg.to_json}\n"
  end
end

# Usage across application
$logger = ProductionLogger.new('user-service', ENV['RAILS_ENV'])

class UserController < ApplicationController
  def create
    user = User.create(user_params)
    $logger.info('User created', { user_id: user.id, email: user.email })
    render json: user
  rescue => error
    $logger.error('User creation failed', error, { params: user_params })
    render json: { error: 'Creation failed' }, status: 422
  end
end

Metric collection systems gather quantitative data about application behavior, resource usage, and business operations. Production applications implement custom metric collectors that push data to time-series databases.

require 'net/http'
require 'json'
require 'uri'

class MetricsCollector
  def initialize(statsd_host = 'localhost', statsd_port = 8125)
    @statsd_host = statsd_host
    @statsd_port = statsd_port
    @socket = UDPSocket.new
  end

  def increment(metric_name, value = 1, tags = {})
    send_metric("#{metric_name}:#{value}|c", tags)
  end

  def gauge(metric_name, value, tags = {})
    send_metric("#{metric_name}:#{value}|g", tags)
  end

  def timing(metric_name, duration_ms, tags = {})
    send_metric("#{metric_name}:#{duration_ms}|ms", tags)
  end

  def histogram(metric_name, value, tags = {})
    send_metric("#{metric_name}:#{value}|h", tags)
  end

  def time_block(metric_name, tags = {})
    start_time = Time.now
    result = yield
    duration_ms = ((Time.now - start_time) * 1000).round(2)
    timing(metric_name, duration_ms, tags)
    result
  end

  private

  def send_metric(metric_data, tags)
    tagged_metric = tags.empty? ? metric_data : "#{metric_data}|##{format_tags(tags)}"
    @socket.send(tagged_metric, 0, @statsd_host, @statsd_port)
  rescue => error
    # Log metric sending errors but don't fail application logic
    puts "Metric send failed: #{error.message}"
  end

  def format_tags(tags)
    tags.map { |k, v| "#{k}:#{v}" }.join(',')
  end
end

# Application integration
class PaymentService
  def initialize
    @metrics = MetricsCollector.new
  end

  def process_payment(amount, payment_method)
    @metrics.increment('payment.attempted', 1, {
      method: payment_method,
      amount_range: amount_range(amount)
    })

    result = @metrics.time_block('payment.processing_time', { method: payment_method }) do
      charge_payment(amount, payment_method)
    end

    if result.success?
      @metrics.increment('payment.successful')
      @metrics.histogram('payment.amount', amount)
    else
      @metrics.increment('payment.failed', 1, { 
        error_type: result.error_code,
        method: payment_method 
      })
    end

    result
  end

  private

  def amount_range(amount)
    case amount
    when 0..999 then 'small'
    when 1000..9999 then 'medium'
    else 'large'
    end
  end
end

Application Performance Monitoring (APM) tools provide detailed transaction tracing and performance analysis. Production deployments configure APM agents to collect detailed timing data while minimizing performance overhead.

# config/initializers/monitoring.rb
require 'newrelic_rpm'
require 'scout_apm'

# NewRelic custom instrumentation
module CustomInstrumentation
  def self.included(base)
    base.extend(ClassMethods)
  end

  module ClassMethods
    def trace_execution(method_name, category: :custom)
      alias_method "#{method_name}_without_tracing", method_name
      
      define_method(method_name) do |*args, &block|
        NewRelic::Agent::MethodTracer.trace_execution_scoped(
          "Custom/#{self.class.name}/#{method_name}"
        ) do
          send("#{method_name}_without_tracing", *args, &block)
        end
      end
    end
  end
end

class ReportGenerator
  include CustomInstrumentation
  
  def generate_monthly_report(user_id, month)
    # Report generation logic
    data = collect_user_data(user_id, month)
    report = build_report(data)
    deliver_report(report, user_id)
  end

  def collect_user_data(user_id, month)
    # Data collection logic
    NewRelic::Agent.record_metric('Custom/ReportData/UserCount', 1)
    User.find(user_id).monthly_data(month)
  end

  trace_execution :generate_monthly_report
  trace_execution :collect_user_data
end

Circuit breaker patterns prevent cascading failures by monitoring service health and stopping requests to failing services. Production applications implement circuit breakers around external service calls.

class CircuitBreaker
  STATES = [:closed, :open, :half_open].freeze
  
  def initialize(failure_threshold: 5, timeout: 60, success_threshold: 3)
    @failure_threshold = failure_threshold
    @timeout = timeout
    @success_threshold = success_threshold
    @failure_count = 0
    @success_count = 0
    @last_failure_time = nil
    @state = :closed
    @mutex = Mutex.new
  end

  def call
    @mutex.synchronize do
      case @state
      when :open
        if Time.now - @last_failure_time > @timeout
          @state = :half_open
          @success_count = 0
        else
          raise CircuitBreakerOpenError, "Circuit breaker is OPEN"
        end
      when :half_open
        # Allow limited requests through
      when :closed
        # Normal operation
      end
    end

    begin
      result = yield
      on_success
      result
    rescue => error
      on_failure
      raise
    end
  end

  def state
    @state
  end

  def metrics
    {
      state: @state,
      failure_count: @failure_count,
      success_count: @success_count,
      last_failure_time: @last_failure_time
    }
  end

  private

  def on_success
    @mutex.synchronize do
      @failure_count = 0
      
      if @state == :half_open
        @success_count += 1
        if @success_count >= @success_threshold
          @state = :closed
        end
      end
    end
  end

  def on_failure
    @mutex.synchronize do
      @failure_count += 1
      @last_failure_time = Time.now
      
      if @failure_count >= @failure_threshold
        @state = :open
      end
    end
  end
end

class ExternalApiClient
  def initialize
    @circuit_breaker = CircuitBreaker.new(failure_threshold: 3, timeout: 30)
  end

  def fetch_user_data(user_id)
    @circuit_breaker.call do
      # External API call
      response = HTTP.get("https://api.example.com/users/#{user_id}")
      JSON.parse(response.body)
    end
  rescue CircuitBreakerOpenError => error
    # Return cached data or default response
    fetch_cached_user_data(user_id)
  end
end

class CircuitBreakerOpenError < StandardError; end

Error Handling & Debugging

Error monitoring captures, categorizes, and reports application exceptions with contextual information for debugging. Ruby applications implement error handling through structured exception capture, automatic error reporting, and custom error classification systems.

Exception tracking systems collect error data including stack traces, request context, user information, and environment details. Production applications integrate with error monitoring services while maintaining custom error handling logic.

require 'sentry-ruby'
require 'json'

class ErrorHandler
  def initialize(logger, sentry_dsn = nil)
    @logger = logger
    setup_sentry(sentry_dsn) if sentry_dsn
  end

  def handle_error(error, context = {})
    error_data = build_error_data(error, context)
    
    # Log error locally
    @logger.error("Application Error", error_data)
    
    # Report to external service
    report_to_sentry(error, context) if sentry_configured?
    
    # Custom error processing
    process_error_by_type(error, context)
    
    error_data
  end

  def wrap_execution(context = {})
    yield
  rescue => error
    handle_error(error, context)
    raise unless context[:suppress_reraise]
  end

  private

  def build_error_data(error, context)
    {
      error_class: error.class.name,
      error_message: error.message,
      backtrace: error.backtrace&.first(20),
      context: context,
      timestamp: Time.now.utc.iso8601,
      process_id: Process.pid,
      thread_id: Thread.current.object_id,
      environment: {
        ruby_version: RUBY_VERSION,
        rails_env: ENV['RAILS_ENV'],
        hostname: Socket.gethostname
      }
    }
  end

  def setup_sentry(dsn)
    Sentry.init do |config|
      config.dsn = dsn
      config.breadcrumbs_logger = [:active_support_logger, :http_logger]
      config.traces_sample_rate = 0.1
    end
  end

  def report_to_sentry(error, context)
    Sentry.with_scope do |scope|
      scope.set_tags(context[:tags]) if context[:tags]
      scope.set_user(context[:user]) if context[:user]
      scope.set_context("request", context[:request]) if context[:request]
      Sentry.capture_exception(error)
    end
  end

  def process_error_by_type(error, context)
    case error
    when ActiveRecord::RecordNotFound
      # Handle record not found errors
      increment_metric('errors.not_found')
    when Net::TimeoutError
      # Handle timeout errors
      increment_metric('errors.timeout')
      alert_operations_team(error, context) if context[:critical]
    when JSON::ParserError
      # Handle JSON parsing errors
      increment_metric('errors.json_parse')
    else
      # Handle unexpected errors
      increment_metric('errors.unexpected')
      alert_operations_team(error, context)
    end
  end

  def sentry_configured?
    defined?(Sentry) && Sentry.configuration.dsn
  end

  def increment_metric(metric_name)
    # Placeholder for metrics system
    puts "Metric incremented: #{metric_name}"
  end

  def alert_operations_team(error, context)
    # Placeholder for alerting system
    puts "ALERT: #{error.class.name} - #{error.message}"
  end
end

# Application integration
class ApplicationController < ActionController::Base
  before_action :setup_error_context
  rescue_from StandardError, with: :handle_application_error

  private

  def setup_error_context
    @error_handler = ErrorHandler.new($logger, ENV['SENTRY_DSN'])
    @error_context = {
      user: { id: current_user&.id, email: current_user&.email },
      request: {
        path: request.path,
        method: request.method,
        params: request.parameters,
        user_agent: request.user_agent,
        ip_address: request.remote_ip
      },
      tags: {
        controller: controller_name,
        action: action_name,
        environment: Rails.env
      }
    }
  end

  def handle_application_error(error)
    error_data = @error_handler.handle_error(error, @error_context)
    
    case error
    when ActiveRecord::RecordNotFound
      render json: { error: 'Resource not found' }, status: 404
    when ActionController::ParameterMissing
      render json: { error: 'Missing required parameters' }, status: 400
    else
      render json: { error: 'Internal server error' }, status: 500
    end
  end
end

Custom exception classes provide structured error handling with specific error codes and recovery strategies. Applications define exception hierarchies that enable precise error handling and reporting.

module ApplicationErrors
  class BaseError < StandardError
    attr_reader :error_code, :context, :severity

    def initialize(message, error_code: nil, context: {}, severity: :error)
      super(message)
      @error_code = error_code || self.class.default_error_code
      @context = context
      @severity = severity
    end

    def self.default_error_code
      name.demodulize.underscore.upcase
    end

    def to_h
      {
        error_class: self.class.name,
        error_code: error_code,
        message: message,
        context: context,
        severity: severity,
        timestamp: Time.now.utc.iso8601
      }
    end

    def retryable?
      false
    end
  end

  class ValidationError < BaseError
    def severity
      :warning
    end
  end

  class ExternalServiceError < BaseError
    def retryable?
      true
    end
  end

  class PaymentError < BaseError
    attr_reader :payment_id, :amount

    def initialize(message, payment_id:, amount:, **options)
      @payment_id = payment_id
      @amount = amount
      super(message, **options)
    end

    def context
      super.merge(payment_id: payment_id, amount: amount)
    end
  end

  class RateLimitError < ExternalServiceError
    attr_reader :retry_after

    def initialize(message, retry_after:, **options)
      @retry_after = retry_after
      super(message, **options)
    end

    def context
      super.merge(retry_after: retry_after)
    end
  end
end

# Usage in application services
class PaymentProcessor
  include ApplicationErrors

  def process_payment(amount, card_token)
    validate_payment_amount(amount)
    
    result = charge_card(amount, card_token)
    
    unless result.success?
      raise PaymentError.new(
        "Payment processing failed: #{result.error_message}",
        payment_id: result.payment_id,
        amount: amount,
        error_code: result.error_code,
        context: { card_last_four: card_token.last_four }
      )
    end
    
    result
  rescue Net::TimeoutError => error
    raise ExternalServiceError.new(
      "Payment gateway timeout",
      error_code: 'GATEWAY_TIMEOUT',
      context: { amount: amount, gateway: 'stripe' }
    )
  rescue JSON::ParserError => error
    raise ExternalServiceError.new(
      "Invalid response from payment gateway",
      error_code: 'GATEWAY_INVALID_RESPONSE',
      context: { amount: amount, response_body: error.data }
    )
  end

  private

  def validate_payment_amount(amount)
    if amount <= 0
      raise ValidationError.new(
        "Payment amount must be positive",
        error_code: 'INVALID_AMOUNT',
        context: { amount: amount }
      )
    end
    
    if amount > 100_000_00  # $100,000 in cents
      raise ValidationError.new(
        "Payment amount exceeds maximum limit",
        error_code: 'AMOUNT_TOO_LARGE',
        context: { amount: amount, limit: 100_000_00 }
      )
    end
  end
end

Debugging tools provide runtime inspection and error analysis capabilities. Production applications implement debugging interfaces that expose application state without compromising security.

class DebugInspector
  def initialize(application_name)
    @application_name = application_name
    @debug_data = {}
    @mutex = Mutex.new
  end

  def capture_state(identifier, data)
    @mutex.synchronize do
      @debug_data[identifier] = {
        data: sanitize_data(data),
        timestamp: Time.now.utc.iso8601,
        thread_id: Thread.current.object_id
      }
    end
  end

  def inspect_error(error, context = {})
    error_info = {
      error_class: error.class.name,
      error_message: error.message,
      backtrace: sanitize_backtrace(error.backtrace),
      context: sanitize_data(context),
      environment_info: gather_environment_info,
      memory_info: gather_memory_info,
      thread_info: gather_thread_info
    }
    
    capture_state("error_#{SecureRandom.hex(8)}", error_info)
    error_info
  end

  def generate_debug_report
    @mutex.synchronize do
      {
        application: @application_name,
        generated_at: Time.now.utc.iso8601,
        runtime_info: {
          ruby_version: RUBY_VERSION,
          platform: RUBY_PLATFORM,
          engine: RUBY_ENGINE
        },
        captured_states: @debug_data,
        system_metrics: gather_system_metrics
      }
    end
  end

  def clear_debug_data
    @mutex.synchronize do
      @debug_data.clear
    end
  end

  private

  def sanitize_data(data)
    case data
    when Hash
      data.each_with_object({}) do |(key, value), sanitized|
        sanitized_key = key.to_s.downcase
        if sensitive_key?(sanitized_key)
          sanitized[key] = '[FILTERED]'
        else
          sanitized[key] = sanitize_data(value)
        end
      end
    when Array
      data.map { |item| sanitize_data(item) }
    when String
      data.length > 1000 ? "#{data[0..997]}..." : data
    else
      data
    end
  end

  def sensitive_key?(key)
    %w[password token secret key authorization].any? { |sensitive| key.include?(sensitive) }
  end

  def sanitize_backtrace(backtrace)
    return [] unless backtrace
    
    backtrace.first(25).map do |line|
      # Remove absolute paths for security
      line.gsub(Dir.pwd, '[APP_ROOT]')
    end
  end

  def gather_environment_info
    {
      hostname: Socket.gethostname,
      process_id: Process.pid,
      parent_process_id: Process.ppid,
      user_id: Process.uid,
      working_directory: Dir.pwd
    }
  end

  def gather_memory_info
    {
      object_count: ObjectSpace.count_objects,
      gc_stats: GC.stat,
      process_memory_kb: `ps -o rss= -p #{Process.pid}`.to_i
    }
  rescue
    { error: 'Memory info unavailable' }
  end

  def gather_thread_info
    {
      thread_count: Thread.list.count,
      main_thread_alive: Thread.main.alive?,
      current_thread_priority: Thread.current.priority
    }
  end

  def gather_system_metrics
    {
      load_average: `uptime`.match(/load average: (.+)$/)[1] rescue 'unavailable',
      disk_usage: `df -h /`.lines.last.split rescue ['unavailable'],
      timestamp: Time.now.utc.iso8601
    }
  end
end

# Integration with error handler
class ProductionErrorHandler < ErrorHandler
  def initialize(logger, sentry_dsn = nil)
    super(logger, sentry_dsn)
    @debug_inspector = DebugInspector.new('production-app')
  end

  def handle_error(error, context = {})
    # Capture debug state
    debug_info = @debug_inspector.inspect_error(error, context)
    
    # Enhanced context with debug info
    enhanced_context = context.merge(debug_session_id: debug_info[:debug_session_id])
    
    # Call parent error handling
    super(error, enhanced_context)
  end

  def generate_incident_report(incident_id)
    {
      incident_id: incident_id,
      debug_report: @debug_inspector.generate_debug_report,
      generated_at: Time.now.utc.iso8601
    }
  end
end

Performance & Memory

Performance monitoring tracks application response times, throughput, resource utilization, and bottleneck identification. Ruby applications implement performance measurement through timing instrumentation, profiling integration, and resource monitoring systems.

Timing measurement captures method execution duration, request processing time, and external service response times. Applications use Ruby's Benchmark module and custom timing collectors to gather performance data.

require 'benchmark'
require 'json'

class PerformanceMonitor
  def initialize
    @measurements = []
    @mutex = Mutex.new
    @gc_stats_before = nil
  end

  def measure_execution(operation_name, metadata = {})
    gc_stats_before = GC.stat
    memory_before = memory_usage_kb
    
    result = nil
    benchmark = Benchmark.measure do
      result = yield
    end
    
    gc_stats_after = GC.stat
    memory_after = memory_usage_kb
    
    measurement = {
      operation: operation_name,
      duration_seconds: benchmark.real,
      cpu_time_seconds: benchmark.total,
      user_time_seconds: benchmark.utime,
      system_time_seconds: benchmark.stime,
      memory_delta_kb: memory_after - memory_before,
      gc_runs: gc_stats_after[:count] - gc_stats_before[:count],
      objects_allocated: gc_stats_after[:total_allocated_objects] - gc_stats_before[:total_allocated_objects],
      timestamp: Time.now.utc.iso8601,
      metadata: metadata
    }
    
    record_measurement(measurement)
    result
  end

  def measure_memory_allocation(&block)
    gc_disable = GC.disable
    before_stats = GC.stat
    memory_before = memory_usage_kb
    
    result = yield
    
    after_stats = GC.stat
    memory_after = memory_usage_kb
    
    {
      result: result,
      objects_allocated: after_stats[:total_allocated_objects] - before_stats[:total_allocated_objects],
      memory_allocated_kb: memory_after - memory_before,
      gc_disabled: gc_disable
    }
  ensure
    GC.enable unless gc_disable
  end

  def benchmark_comparison(operations = {})
    results = {}
    
    operations.each do |name, operation|
      measurements = Array.new(5) do
        measure_execution("benchmark_#{name}") { operation.call }
      end
      
      durations = measurements.map { |m| m[:duration_seconds] }
      results[name] = {
        min: durations.min,
        max: durations.max,
        avg: durations.sum / durations.length,
        median: durations.sort[durations.length / 2],
        iterations: durations.length
      }
    end
    
    results
  end

  def performance_report(time_range = 3600)
    cutoff_time = Time.now - time_range
    recent_measurements = @measurements.select do |m|
      Time.parse(m[:timestamp]) >= cutoff_time
    end
    
    operations = recent_measurements.group_by { |m| m[:operation] }
    
    {
      report_period: time_range,
      measurement_count: recent_measurements.length,
      operations: operations.transform_values do |measurements|
        durations = measurements.map { |m| m[:duration_seconds] }
        memory_deltas = measurements.map { |m| m[:memory_delta_kb] }
        
        {
          call_count: measurements.length,
          avg_duration: durations.sum / durations.length,
          p95_duration: percentile(durations, 95),
          p99_duration: percentile(durations, 99),
          max_duration: durations.max,
          avg_memory_delta: memory_deltas.sum / memory_deltas.length,
          total_objects_allocated: measurements.sum { |m| m[:objects_allocated] }
        }
      end
    }
  end

  private

  def record_measurement(measurement)
    @mutex.synchronize do
      @measurements << measurement
      # Keep only recent measurements to prevent memory growth
      @measurements = @measurements.last(10_000) if @measurements.length > 12_000
    end
  end

  def memory_usage_kb
    `ps -o rss= -p #{Process.pid}`.to_i
  rescue
    0
  end

  def percentile(values, percentile)
    return 0 if values.empty?
    sorted = values.sort
    index = (percentile / 100.0 * sorted.length).ceil - 1
    sorted[index]
  end
end

# Application integration
class UserService
  def initialize
    @performance_monitor = PerformanceMonitor.new
  end

  def create_user_with_profile(user_data, profile_data)
    @performance_monitor.measure_execution('user_creation', {
      has_profile: !profile_data.empty?,
      user_attributes_count: user_data.keys.length
    }) do
      ActiveRecord::Base.transaction do
        user = create_user(user_data)
        profile = create_profile(user, profile_data) unless profile_data.empty?
        send_welcome_email(user)
        { user: user, profile: profile }
      end
    end
  end

  def bulk_import_users(users_data)
    comparison_results = @performance_monitor.benchmark_comparison({
      individual_creates: -> { users_data.each { |data| User.create(data) } },
      bulk_insert: -> { User.insert_all(users_data) },
      activerecord_import: -> { User.import(users_data.map { |data| User.new(data) }) }
    })
    
    puts "Bulk import performance comparison:"
    comparison_results.each do |method, stats|
      puts "#{method}: #{stats[:avg]}s average (#{stats[:iterations]} runs)"
    end
    
    # Use the fastest method based on results
    fastest_method = comparison_results.min_by { |_, stats| stats[:avg] }.first
    send("bulk_import_#{fastest_method}", users_data)
  end

  private

  def create_user(user_data)
    User.create!(user_data)
  end

  def create_profile(user, profile_data)
    user.create_profile!(profile_data)
  end

  def send_welcome_email(user)
    WelcomeMailer.welcome_email(user).deliver_now
  end
end

Memory profiling identifies memory leaks, object allocation patterns, and garbage collection performance. Ruby applications use memory profiling tools and custom instrumentation to monitor memory usage.

require 'objspace'

class MemoryProfiler
  def initialize
    @snapshots = {}
    @allocation_tracking = false
  end

  def take_snapshot(name)
    ObjectSpace.trace_object_allocations_start unless @allocation_tracking
    @allocation_tracking = true
    
    snapshot = {
      timestamp: Time.now.utc.iso8601,
      object_counts: ObjectSpace.count_objects,
      gc_stats: GC.stat,
      memory_usage_kb: memory_usage_kb,
      object_allocations: sample_object_allocations,
      largest_objects: find_largest_objects
    }
    
    @snapshots[name] = snapshot
    snapshot
  end

  def compare_snapshots(before_name, after_name)
    before = @snapshots[before_name]
    after = @snapshots[after_name]
    
    return nil unless before && after
    
    {
      time_elapsed: Time.parse(after[:timestamp]) - Time.parse(before[:timestamp]),
      memory_delta_kb: after[:memory_usage_kb] - before[:memory_usage_kb],
      object_count_deltas: calculate_object_deltas(before[:object_counts], after[:object_counts]),
      gc_stats_deltas: calculate_gc_deltas(before[:gc_stats], after[:gc_stats]),
      new_allocations: after[:object_allocations] - before[:object_allocations]
    }
  end

  def profile_memory_allocation(&block)
    ObjectSpace.trace_object_allocations_start
    
    before_snapshot = take_heap_snapshot
    result = yield
    after_snapshot = take_heap_snapshot
    
    allocations = ObjectSpace.trace_object_allocations_stop
    
    {
      result: result,
      before_snapshot: before_snapshot,
      after_snapshot: after_snapshot,
      allocations_by_class: group_allocations_by_class,
      allocations_by_location: group_allocations_by_location,
      total_allocated: after_snapshot[:total_objects] - before_snapshot[:total_objects]
    }
  end

  def identify_memory_leaks(threshold_snapshots = 5)
    return [] if @snapshots.length < threshold_snapshots
    
    sorted_snapshots = @snapshots.values.sort_by { |s| Time.parse(s[:timestamp]) }
    memory_trend = sorted_snapshots.map { |s| s[:memory_usage_kb] }
    
    # Calculate memory growth trend
    growth_rate = calculate_growth_rate(memory_trend)
    
    leaks = []
    
    # Check for consistent memory growth
    if growth_rate > 1000 # 1MB per snapshot threshold
      leaks << {
        type: :consistent_growth,
        growth_rate_kb_per_snapshot: growth_rate,
        severity: growth_rate > 10_000 ? :critical : :warning
      }
    end
    
    # Check for object count increases
    object_trends = calculate_object_trends(sorted_snapshots)
    object_trends.each do |klass, growth|
      if growth > 10000 # 10k objects threshold
        leaks << {
          type: :object_accumulation,
          object_class: klass,
          growth_count: growth,
          severity: growth > 100_000 ? :critical : :warning
        }
      end
    end
    
    leaks
  end

  def generate_memory_report
    return {} if @snapshots.empty?
    
    latest = @snapshots.values.max_by { |s| Time.parse(s[:timestamp]) }
    
    {
      current_memory_kb: latest[:memory_usage_kb],
      total_objects: latest[:object_counts][:TOTAL],
      gc_statistics: latest[:gc_stats],
      top_object_classes: latest[:object_counts]
        .reject { |k, _| k == :TOTAL }
        .sort_by { |_, count| -count }
        .first(10)
        .to_h,
      snapshot_count: @snapshots.length,
      memory_leaks: identify_memory_leaks,
      recommendations: generate_recommendations(latest)
    }
  end

  private

  def memory_usage_kb
    `ps -o rss= -p #{Process.pid}`.to_i
  rescue
    0
  end

  def sample_object_allocations(sample_size = 1000)
    return 0 unless @allocation_tracking
    
    all_objects = ObjectSpace.each_object.to_a.sample(sample_size)
    all_objects.count { |obj| ObjectSpace.allocation_sourcefile(obj) }
  end

  def find_largest_objects(count = 10)
    largest = []
    ObjectSpace.each_object do |obj|
      size = ObjectSpace.memsize_of(obj)
      if largest.length < count || size > largest.last[:size]
        largest << {
          class: obj.class.name,
          size: size,
          object_id: obj.object_id
        }
        largest.sort_by! { |o| -o[:size] }
        largest.pop if largest.length > count
      end
    end
    largest
  rescue
    []
  end

  def take_heap_snapshot
    {
      timestamp: Time.now.utc.iso8601,
      total_objects: ObjectSpace.count_objects[:TOTAL],
      memory_kb: memory_usage_kb,
      gc_count: GC.count
    }
  end

  def group_allocations_by_class
    allocations = {}
    ObjectSpace.each_object do |obj|
      klass = obj.class.name
      allocations[klass] ||= 0
      allocations[klass] += 1
    end
    allocations.sort_by { |_, count| -count }.first(20).to_h
  rescue
    {}
  end

  def calculate_object_deltas(before, after)
    deltas = {}
    (before.keys + after.keys).uniq.each do |key|
      before_count = before[key] || 0
      after_count = after[key] || 0
      delta = after_count - before_count
      deltas[key] = delta if delta != 0
    end
    deltas
  end

  def calculate_gc_deltas(before, after)
    {
      count: after[:count] - before[:count],
      total_time: after[:total_time] - before[:total_time],
      major_gc_count: after[:major_gc_count] - before[:major_gc_count],
      minor_gc_count: after[:minor_gc_count] - before[:minor_gc_count]
    }
  end

  def calculate_growth_rate(memory_values)
    return 0 if memory_values.length < 2
    
    total_growth = memory_values.last - memory_values.first
    snapshots = memory_values.length - 1
    total_growth.to_f / snapshots
  end

  def calculate_object_trends(snapshots)
    return {} if snapshots.length < 2
    
    first_counts = snapshots.first[:object_counts]
    last_counts = snapshots.last[:object_counts]
    
    trends = {}
    (first_counts.keys + last_counts.keys).uniq.each do |klass|
      first_count = first_counts[klass] || 0
      last_count = last_counts[klass] || 0
      growth = last_count - first_count
      trends[klass] = growth if growth > 0
    end
    
    trends
  end

  def generate_recommendations(snapshot)
    recommendations = []
    
    # High memory usage
    if snapshot[:memory_usage_kb] > 500_000 # 500MB
      recommendations << "Consider reducing memory usage - current usage is #{snapshot[:memory_usage_kb] / 1024}MB"
    end
    
    # High object counts
    high_count_classes = snapshot[:object_counts].select { |k, v| v > 100_000 && k != :TOTAL }
    unless high_count_classes.empty?
      recommendations << "High object counts detected: #{high_count_classes.keys.join(', ')}"
    end
    
    # GC pressure
    if snapshot[:gc_stats][:minor_gc_count] > 1000
      recommendations << "High GC pressure detected - consider object pooling or reducing allocations"
    end
    
    recommendations
  end
end

Database query performance monitoring tracks SQL execution times, N+1 query detection, and database connection utilization. Applications instrument database operations to identify performance bottlenecks.

class DatabasePerformanceMonitor
  def initialize
    @query_stats = {}
    @slow_query_threshold = 1.0 # 1 second
    @n_plus_one_detector = NPlusOneDetector.new
    @connection_pool_monitor = ConnectionPoolMonitor.new
  end

  def monitor_query(sql, binds = [])
    normalized_sql = normalize_sql(sql)
    query_id = generate_query_id(normalized_sql)
    
    start_time = Time.now
    connection_info = capture_connection_info
    
    result = yield
    
    execution_time = Time.now - start_time
    
    record_query_stats(query_id, normalized_sql, execution_time, binds.length, connection_info)
    detect_n_plus_one(normalized_sql)
    
    if execution_time > @slow_query_threshold
      log_slow_query(normalized_sql, execution_time, binds)
    end
    
    result
  end

  def analyze_query_patterns(time_window = 3600)
    cutoff = Time.now - time_window
    recent_stats = @query_stats.select { |_, stats| stats[:last_executed] >= cutoff }
    
    {
      total_queries: recent_stats.sum { |_, stats| stats[:count] },
      unique_queries: recent_stats.length,
      slow_queries: recent_stats.count { |_, stats| stats[:avg_time] > @slow_query_threshold },
      most_frequent: recent_stats.max_by { |_, stats| stats[:count] },
      slowest_average: recent_stats.max_by { |_, stats| stats[:avg_time] },
      n_plus_one_occurrences: @n_plus_one_detector.detected_patterns.length,
      connection_pool_stats: @connection_pool_monitor.current_stats
    }
  end

  def generate_optimization_suggestions
    suggestions = []
    
    # Identify frequently executed slow queries
    slow_frequent_queries = @query_stats.select do |_, stats|
      stats[:count] > 100 && stats[:avg_time] > 0.5
    end
    
    slow_frequent_queries.each do |query_id, stats|
      suggestions << {
        type: :slow_frequent_query,
        query_pattern: stats[:sql_pattern],
        count: stats[:count],
        avg_time: stats[:avg_time],
        recommendation: "Consider adding database indexes or optimizing this frequently executed slow query"
      }
    end
    
    # Check for N+1 query patterns
    @n_plus_one_detector.detected_patterns.each do |pattern|
      suggestions << {
        type: :n_plus_one,
        pattern: pattern[:pattern],
        occurrences: pattern[:count],
        recommendation: "Use includes/joins to eager load associations and eliminate N+1 queries"
      }
    end
    
    # Connection pool utilization
    pool_stats = @connection_pool_monitor.current_stats
    if pool_stats[:utilization] > 0.8
      suggestions << {
        type: :connection_pool_pressure,
        utilization: pool_stats[:utilization],
        recommendation: "Consider increasing connection pool size or optimizing long-running queries"
      }
    end
    
    suggestions
  end

  private

  def normalize_sql(sql)
    # Remove literal values and normalize whitespace
    sql.gsub(/\$\d+|\?|'[^']*'|\d+/, '?')
       .gsub(/\s+/, ' ')
       .strip
       .upcase
  end

  def generate_query_id(normalized_sql)
    Digest::MD5.hexdigest(normalized_sql)[0, 12]
  end

  def record_query_stats(query_id, sql, execution_time, bind_count, connection_info)
    @query_stats[query_id] ||= {
      sql_pattern: sql,
      count: 0,
      total_time: 0.0,
      avg_time: 0.0,
      min_time: Float::INFINITY,
      max_time: 0.0,
      bind_count: bind_count,
      first_executed: Time.now,
      last_executed: Time.now
    }
    
    stats = @query_stats[query_id]
    stats[:count] += 1
    stats[:total_time] += execution_time
    stats[:avg_time] = stats[:total_time] / stats[:count]
    stats[:min_time] = [stats[:min_time], execution_time].min
    stats[:max_time] = [stats[:max_time], execution_time].max
    stats[:last_executed] = Time.now
  end

  def capture_connection_info
    pool = ActiveRecord::Base.connection_pool
    {
      pool_size: pool.size,
      active_connections: pool.connections.count(&:in_use?),
      available_connections: pool.available_connection_count
    }
  rescue
    { error: 'Unable to capture connection info' }
  end

  def detect_n_plus_one(sql)
    @n_plus_one_detector.analyze_query(sql, caller_locations(3, 5))
  end

  def log_slow_query(sql, execution_time, binds)
    puts "SLOW QUERY (#{execution_time.round(3)}s): #{sql}"
    puts "Binds: #{binds.inspect}" unless binds.empty?
    puts "Backtrace: #{caller_locations(3, 3).map(&:to_s).join("\n  ")}"
  end
end

class NPlusOneDetector
  def initialize
    @query_patterns = {}
    @request_queries = []
    @detection_threshold = 10
  end

  def analyze_query(sql, caller_info)
    return unless sql.match?(/SELECT.*FROM.*WHERE.*=\s*\?/i)
    
    pattern = extract_pattern(sql)
    location = caller_info&.first&.to_s
    
    @request_queries << {
      pattern: pattern,
      location: location,
      timestamp: Time.now
    }
    
    # Check for repeated patterns
    recent_queries = @request_queries.last(50)
    pattern_count = recent_queries.count { |q| q[:pattern] == pattern }
    
    if pattern_count >= @detection_threshold
      record_n_plus_one(pattern, location, pattern_count)
    end
  end

  def detected_patterns
    @query_patterns.values
  end

  def reset_request_queries
    @request_queries.clear
  end

  private

  def extract_pattern(sql)
    # Extract table name and general query structure
    sql.gsub(/\s+/, ' ')
       .gsub(/'[^']*'|\d+/, '?')
       .strip
  end

  def record_n_plus_one(pattern, location, count)
    @query_patterns[pattern] ||= {
      pattern: pattern,
      first_detected: Time.now,
      locations: Set.new,
      count: 0
    }
    
    @query_patterns[pattern][:locations] << location if location
    @query_patterns[pattern][:count] += 1
    @query_patterns[pattern][:last_detected] = Time.now
  end
end

class ConnectionPoolMonitor
  def current_stats
    pool = ActiveRecord::Base.connection_pool
    
    {
      size: pool.size,
      checked_out: pool.stat[:busy],
      checked_in: pool.stat[:dead],
      available: pool.available_connection_count,
      utilization: pool.stat[:busy].to_f / pool.size
    }
  rescue => error
    { error: error.message }
  end
end

Reference

Core Classes and Modules

Class/Module Purpose Key Methods
Logger Standard logging functionality #info, #error, #warn, #debug, #fatal
Benchmark Performance measurement .measure, .realtime, .bm, .bmbm
TracePoint Runtime tracing and instrumentation .new, #enable, #disable, #event, #defined_class
ObjectSpace Object and memory inspection .count_objects, .each_object, .trace_object_allocations_start
GC Garbage collector interface .stat, .count, .disable, .enable, .start

Monitoring Patterns

Pattern Use Case Implementation
Method Wrapping Instrument existing methods Module#prepend, alias_method, define_method
Circuit Breaker Prevent cascading failures State machine with failure thresholds
Health Checks Service availability monitoring HTTP endpoints with dependency checks
Metric Collection Quantitative data gathering Custom collectors with time-series data
Error Handling Exception tracking and reporting Structured exception capture with context

Popular Monitoring Gems

Gem Purpose Configuration Key Features
newrelic_rpm Application Performance Monitoring config/newrelic.yml Transaction tracing, custom metrics, error tracking
scout_apm Lightweight APM config/scout_apm.yml Performance monitoring, memory tracking, N+1 detection
sentry-ruby Error tracking Sentry.init block Exception capture, breadcrumbs, release tracking
datadog Infrastructure monitoring Datadog.configure Metrics, traces, logs, APM integration
elastic-apm Elastic Stack APM config/elastic_apm.yml Distributed tracing, metrics, error tracking

Logger Configuration

Level Numeric Value Usage Output Includes
FATAL 4 System unusable Process termination errors
ERROR 3 Error conditions Exceptions, failures
WARN 2 Warning conditions Deprecated usage, recoverable errors
INFO 1 Informational Request logs, business events
DEBUG 0 Debug information Variable states, execution flow

Benchmark Methods

Method Returns Description
Benchmark.measure Benchmark::Tms Single operation timing
Benchmark.realtime Float Wall clock time only
Benchmark.bm Array Multiple operation comparison
Benchmark.bmbm Array Rehearsal + measurement

TracePoint Events

Event Triggers Available Data
:call Method calls #defined_class, #method_id, #parameters
:return Method returns #return_value, #defined_class, #method_id
:c_call C method calls #defined_class, #method_id
:raise Exception raising #raised_exception
:line Line execution #lineno, #path

ObjectSpace Methods

Method Returns Description
ObjectSpace.count_objects Hash Object counts by class
ObjectSpace.count_objects_size Hash Memory size by object type
ObjectSpace.memsize_of(obj) Integer Memory size of specific object
ObjectSpace.trace_object_allocations_start nil Begin allocation tracking
ObjectSpace.allocation_sourcefile(obj) String Source file where object allocated

GC Statistics

Stat Key Type Description
:count Integer Total GC runs
:time Integer Total GC time (microseconds)
:heap_allocated_pages Integer Allocated heap pages
:heap_available_slots Integer Available object slots
:heap_live_slots Integer Live objects count
:total_allocated_objects Integer Total objects allocated
:major_gc_count Integer Major GC runs
:minor_gc_count Integer Minor GC runs

Error Handling Patterns

# Basic error handling with context
begin
  risky_operation
rescue SpecificError => error
  logger.error("Operation failed", { 
    error: error.class.name,
    message: error.message,
    context: operation_context 
  })
  handle_specific_error(error)
rescue StandardError => error
  logger.error("Unexpected error", error_details(error))
  raise
ensure
  cleanup_resources
end

# Circuit breaker implementation
circuit_breaker = CircuitBreaker.new(
  failure_threshold: 5,
  timeout: 60,
  success_threshold: 3
)

begin
  result = circuit_breaker.call { external_service.call }
rescue CircuitBreakerOpenError
  fallback_response
end

# Custom exception with structured data
class BusinessLogicError < StandardError
  attr_reader :error_code, :details
  
  def initialize(message, error_code:, details: {})
    super(message)
    @error_code = error_code
    @details = details
  end
  
  def to_h
    {
      message: message,
      error_code: error_code,
      details: details,
      backtrace: backtrace&.first(5)
    }
  end
end

Performance Monitoring Setup

# Method-level timing
class TimedMethods
  def self.included(base)
    base.extend(ClassMethods)
  end
  
  module ClassMethods
    def time_method(method_name, options = {})
      alias_method "#{method_name}_untimed", method_name
      
      define_method(method_name) do |*args, &block|
        start_time = Time.now
        result = send("#{method_name}_untimed", *args, &block)
        duration = Time.now - start_time
        
        log_timing(method_name, duration, options)
        result
      end
    end
  end
end

# Memory allocation tracking
def track_allocations(&block)
  ObjectSpace.trace_object_allocations_start
  before_allocations = ObjectSpace.count_objects
  
  result = yield
  
  after_allocations = ObjectSpace.count_objects
  ObjectSpace.trace_object_allocations_stop
  
  {
    result: result,
    allocations: calculate_allocation_delta(before_allocations, after_allocations)
  }
end

# Database query monitoring
ActiveSupport::Notifications.subscribe('sql.active_record') do |name, start, finish, id, payload|
  duration = finish - start
  
  if duration > 1.0  # Log slow queries
    logger.warn("Slow Query", {
      sql: payload[:sql],
      duration: duration,
      name: payload[:name]
    })
  end
end

Health Check Endpoint Pattern

class HealthController < ApplicationController
  def show
    checks = perform_health_checks
    status = checks.all? { |_, check| check[:healthy] } ? :ok : :service_unavailable
    
    render json: {
      status: status == :ok ? 'healthy' : 'unhealthy',
      checks: checks,
      timestamp: Time.current.iso8601,
      version: Rails.application.version
    }, status: status
  end

  private

  def perform_health_checks
    {
      database: check_database_connection,
      redis: check_redis_connection,
      external_api: check_external_services,
      disk_space: check_disk_space
    }
  end

  def check_database_connection
    ActiveRecord::Base.connection.execute('SELECT 1')
    { healthy: true, response_time: measure_response_time }
  rescue => error
    { healthy: false, error: error.message }
  end
end