CrackedRuby - Change Management

Overview

Change management encompasses the processes, tools, and methodologies that teams use to control and track modifications to software systems. The discipline addresses how code changes move from development through testing to production, how teams coordinate concurrent modifications, and how systems maintain stability while incorporating new features and fixes.

The foundation of software change management emerged from traditional engineering change control processes but evolved significantly with distributed version control systems. Modern change management integrates version control, continuous integration, deployment automation, and release coordination into cohesive workflows that balance velocity with reliability.

Change management operates at multiple levels within software organizations. At the code level, it tracks individual commits and branches. At the feature level, it coordinates work across multiple developers. At the release level, it manages deployment timing, rollback procedures, and production stability. Each level requires different tools and processes but all connect through shared principles of traceability, reversibility, and controlled progression.

The scope of change management extends beyond version control to include database migrations, infrastructure changes, configuration updates, and documentation modifications. Each type of change presents unique challenges. Code changes may introduce bugs. Database migrations may cause downtime. Infrastructure changes may affect performance. Configuration updates may create security vulnerabilities. A comprehensive change management system addresses all these change types through appropriate controls and validation processes.

# Simple change tracking in a deployment script
class ChangeTracker
  def initialize(version)
    @version = version
    @timestamp = Time.now
    @changes = []
  end
  
  def record_change(component, description)
    @changes << {
      component: component,
      description: description,
      timestamp: Time.now
    }
  end
  
  def deployment_manifest
    {
      version: @version,
      deployed_at: @timestamp,
      changes: @changes,
      rollback_version: previous_version
    }
  end
end

tracker = ChangeTracker.new("2.3.1")
tracker.record_change("authentication", "Add OAuth2 support")
tracker.record_change("database", "Add users.oauth_token column")
# => Creates deployment record with all changes

Key Principles

Change management rests on several fundamental principles that guide how organizations handle software modifications. These principles apply regardless of specific tools or processes.

Traceability requires that every change connects to a documented reason and responsible party. Each commit links to an issue or feature request. Each deployment references specific commits. Each rollback identifies the problematic change. This principle enables teams to answer questions like "Why did we make this change?" and "Who approved this modification?" without archaeological code analysis.

Atomicity demands that changes group into logical, indivisible units. A feature implementation includes all necessary code, tests, documentation, and database migrations. Deploying half a feature or leaving migrations unrun creates inconsistent states. Atomic changes either complete entirely or fail entirely, preventing partially-applied modifications that corrupt system state.

Reversibility ensures that teams can undo changes when problems arise. Every deployment includes a rollback plan. Every database migration includes a down migration. Every configuration change preserves the previous configuration. This principle acknowledges that despite testing, some problems only manifest in production. Quick reversal limits damage and provides time for proper fixes.

Progressive Exposure controls how changes reach users. New features may deploy first to development environments, then staging, then a small percentage of production users, and finally to all users. This gradual exposure detects problems with limited impact. Each stage provides opportunities to identify issues before full deployment.

Isolation separates concurrent changes to prevent interference. Developers work in feature branches rather than directly on main branches. Each change includes its own tests that run independently. Deployment processes handle one change at a time rather than batching unrelated modifications. Isolation enables parallel work without coordination overhead.

Auditability maintains comprehensive records of all changes. Logs capture who made each change, when it occurred, what specifically changed, and why. These records support compliance requirements, security investigations, and debugging. Audit trails must be immutable and complete to serve their purpose.

The relationship between these principles creates tensions that teams must balance. Atomicity suggests larger changes, while progressive exposure suggests smaller ones. Isolation enables parallel work but increases merge complexity. Reversibility requires additional work that seems wasteful when changes succeed. Change management processes navigate these tensions based on organizational risk tolerance and operational constraints.

# Demonstrating atomic change with transaction
class FeatureDeployment
  def deploy(feature_name)
    ActiveRecord::Base.transaction do
      enable_feature(feature_name)
      run_data_migration(feature_name)
      update_configuration(feature_name)
      notify_monitoring(feature_name)
    end
  rescue => e
    log_failure(feature_name, e)
    raise # Ensures all-or-nothing deployment
  end
  
  private
  
  def enable_feature(name)
    Feature.create!(name: name, enabled: true)
  end
  
  def run_data_migration(name)
    DataMigration.execute(name)
  end
  
  def update_configuration(name)
    Config.set("feature.#{name}.enabled", true)
  end
  
  def notify_monitoring(name)
    Monitoring.track_deployment(name)
  end
end

Implementation Approaches

Organizations implement change management through various approaches that differ in structure, tooling, and coordination mechanisms. The choice depends on team size, release frequency, risk tolerance, and regulatory requirements.

Trunk-Based Development maintains a single main branch where all developers commit frequently. Feature flags control which functionality appears in production. This approach minimizes merge complexity because developers integrate changes continuously rather than in large batches. Teams using trunk-based development typically release multiple times per day.

The core workflow involves developers pulling the latest main branch, making small changes, running automated tests, and pushing directly to main. Feature flags wrap incomplete features, allowing code deployment without feature activation. When features complete, teams enable flags rather than merging large branches.

# Feature flag implementation for trunk-based development
class FeatureFlag
  def self.enabled?(feature_name, user: nil)
    flag = Flag.find_by(name: feature_name)
    return false unless flag
    
    case flag.rollout_strategy
    when 'all'
      flag.enabled
    when 'percentage'
      user && user_in_rollout_percentage?(user, flag.percentage)
    when 'whitelist'
      user && flag.whitelisted_users.include?(user.id)
    else
      false
    end
  end
  
  private
  
  def self.user_in_rollout_percentage?(user, percentage)
    (user.id % 100) < percentage
  end
end

# Usage in application code
if FeatureFlag.enabled?('new_checkout_flow', user: current_user)
  render 'checkout/new_flow'
else
  render 'checkout/legacy_flow'
end

GitFlow structures work around multiple long-lived branches with specific purposes. The main branch holds production code. The develop branch integrates features. Feature branches isolate individual work items. Release branches prepare for production deployment. Hotfix branches address production issues.

This approach provides clear separation between production, integration, and development states. Teams can prepare releases while continuing development work. However, GitFlow creates merge overhead and delays integration, which can lead to conflicts and integration surprises.

GitHub Flow simplifies GitFlow by maintaining only a main branch and short-lived feature branches. Developers create branches for features, open pull requests for review, merge to main after approval, and deploy immediately. This approach balances simplicity with code review while maintaining a deployable main branch.

Release Trains schedule deployments at fixed intervals regardless of feature readiness. Features that complete before the departure time board the train. Incomplete features wait for the next train. This approach creates predictable release schedules that coordinate across teams and stakeholders. However, it can delay feature delivery and create pressure to rush changes before train departure.

# Release train coordination script
class ReleaseTrain
  def initialize(departure_time)
    @departure_time = departure_time
    @features = []
  end
  
  def board_feature(feature)
    if feature.ready? && Time.now < @departure_time
      @features << feature
      tag_for_release(feature)
    else
      schedule_next_train(feature)
    end
  end
  
  def depart
    return unless Time.now >= @departure_time
    
    @features.each do |feature|
      deploy_feature(feature)
    end
    
    create_release_notes
    notify_stakeholders
  end
  
  private
  
  def tag_for_release(feature)
    feature.update(release_tag: next_release_version)
  end
  
  def schedule_next_train(feature)
    next_train = @departure_time + 2.weeks
    feature.update(scheduled_release: next_train)
  end
end

Environment Promotion moves changes through a series of environments that increasingly resemble production. A change might progress through development, integration, staging, pre-production, and production environments. Each environment provides validation opportunities with different characteristics. Early environments enable fast feedback. Later environments provide production-like validation.

The implementation requires maintaining environment parity so that validation in staging accurately predicts production behavior. Configuration management systems ensure consistency while allowing environment-specific settings like database connections or API endpoints.

Common Patterns

Several patterns recur across change management implementations, addressing common challenges in coordinating modifications and maintaining system stability.

Pull Request Workflow gates changes behind review and approval before integration. Developers push branches to shared repositories, open pull requests describing changes, address review feedback, and merge after approval. This pattern enforces code review, prevents direct commits to protected branches, and creates discussion threads documenting decisions.

The pattern integrates with automated checks that must pass before merge approval. Continuous integration runs tests, linters verify code style, security scanners detect vulnerabilities, and coverage tools ensure adequate testing. Pull requests block merging until all checks succeed and required reviewers approve.

# Automated pull request validation
class PullRequestValidator
  def initialize(pr_number)
    @pr = fetch_pull_request(pr_number)
    @checks = []
  end
  
  def validate
    run_test_suite
    check_code_coverage
    scan_dependencies
    verify_migrations
    lint_code_style
    
    @checks.all?(&:passed?)
  end
  
  private
  
  def run_test_suite
    result = system('bundle exec rspec')
    @checks << Check.new('tests', result)
  end
  
  def check_code_coverage
    coverage = SimpleCov.result.covered_percent
    passed = coverage >= 80.0
    @checks << Check.new('coverage', passed)
  end
  
  def scan_dependencies
    result = system('bundle audit check --update')
    @checks << Check.new('security', result)
  end
  
  def verify_migrations
    pending = ActiveRecord::Migration.check_pending!
    @checks << Check.new('migrations', pending.nil?)
  rescue ActiveRecord::PendingMigrationError
    @checks << Check.new('migrations', false)
  end
end

Blue-Green Deployment maintains two production environments that alternate between active and idle. The blue environment serves live traffic while the green environment receives the new deployment. After validating the green environment, traffic switches from blue to green. If problems arise, traffic switches back to blue.

This pattern enables zero-downtime deployments and instant rollback. The idle environment provides a production-equivalent testing ground. However, maintaining two full production environments doubles infrastructure costs, and some systems (like databases) cannot fully duplicate without complex replication.

Canary Deployment releases changes to a small subset of users before full rollout. If metrics show problems, the deployment halts and rolls back. If metrics remain healthy, the rollout expands gradually. This pattern detects issues with limited impact and provides early warning of problems that testing missed.

Implementation requires routing logic that directs specific users to canary versions, monitoring that detects anomalies, and automated or manual decisions about rollout progression. The pattern works well for user-facing changes but applies poorly to backend services without user-specific routing.

# Canary deployment controller
class CanaryDeployment
  def initialize(version, initial_percentage: 5)
    @version = version
    @percentage = initial_percentage
    @metrics = MetricsCollector.new(version)
  end
  
  def route_request(user)
    if canary_user?(user)
      @metrics.record_canary_request
      "canary_#{@version}"
    else
      @metrics.record_stable_request
      "stable"
    end
  end
  
  def expand_rollout
    return if @percentage >= 100
    
    if @metrics.healthy?
      @percentage = [@percentage * 2, 100].min
      record_expansion
    else
      rollback
    end
  end
  
  private
  
  def canary_user?(user)
    (user.id % 100) < @percentage
  end
  
  def rollback
    @percentage = 0
    alert_team("Canary rollback: #{@version}")
  end
end

Database Migration Patterns address the challenge of changing database schemas in production systems. Forward-only migrations avoid down migrations that risk data loss. Backward-compatible migrations ensure code works with both old and new schemas during deployment transitions. Multi-step migrations separate schema changes from code changes to prevent breaking running code.

A typical multi-step migration adds a new column while maintaining the old column, deploys code that writes to both columns, backfills data to the new column, deploys code that reads from the new column, and finally removes the old column. Each step deploys independently and maintains system functionality.

Changelog Automation generates release notes from commit messages or pull request descriptions. Tools parse structured commit messages following conventions like Conventional Commits, extract feature descriptions, bug fixes, and breaking changes, and produce formatted changelogs. This pattern ensures documentation stays current without manual effort but requires consistent commit message discipline.

Ruby Implementation

Ruby provides several tools and libraries that implement change management automation, from version bumping to deployment coordination.

Version Management typically uses semantic versioning (major.minor.patch) stored in version files or constants. The bump gem automates version incrementation based on change type. Rake tasks integrate version management into development workflows.

# Version management module
module AppVersion
  MAJOR = 2
  MINOR = 4
  PATCH = 7
  
  def self.to_s
    "#{MAJOR}.#{MINOR}.#{PATCH}"
  end
  
  def self.bump(type)
    version_file = File.read('lib/app_version.rb')
    
    case type
    when :major
      version_file.gsub!(/MAJOR = \d+/, "MAJOR = #{MAJOR + 1}")
      version_file.gsub!(/MINOR = \d+/, "MINOR = 0")
      version_file.gsub!(/PATCH = \d+/, "PATCH = 0")
    when :minor
      version_file.gsub!(/MINOR = \d+/, "MINOR = #{MINOR + 1}")
      version_file.gsub!(/PATCH = \d+/, "PATCH = 0")
    when :patch
      version_file.gsub!(/PATCH = \d+/, "PATCH = #{PATCH + 1}")
    end
    
    File.write('lib/app_version.rb', version_file)
  end
end

# Rake task for version bumping
namespace :version do
  desc 'Bump major version'
  task :major do
    AppVersion.bump(:major)
    sh "git commit -am 'Bump version to #{AppVersion}'"
    sh "git tag v#{AppVersion}"
  end
end

Git Integration through the ruby-git gem enables automated Git operations within Ruby scripts. Deployment scripts can check branch status, create tags, and push changes programmatically.

require 'git'

class DeploymentManager
  def initialize(repo_path)
    @git = Git.open(repo_path)
  end
  
  def prepare_release(version)
    ensure_clean_working_tree
    checkout_main
    pull_latest
    create_release_tag(version)
  end
  
  private
  
  def ensure_clean_working_tree
    if @git.status.changed.any?
      raise "Uncommitted changes present"
    end
  end
  
  def checkout_main
    @git.checkout('main')
  end
  
  def pull_latest
    @git.pull('origin', 'main')
  end
  
  def create_release_tag(version)
    @git.add_tag("v#{version}")
    @git.push('origin', "v#{version}")
  end
  
  def rollback_to_version(version)
    @git.checkout("v#{version}")
  end
end

Deployment Scripts coordinate the steps required to deploy changes. Capistrano remains the standard Ruby deployment tool, providing tasks for code checkout, dependency installation, asset compilation, database migration, and service restart.

# Capistrano deployment configuration
set :application, 'my_app'
set :repo_url, 'git@github.com:username/my_app.git'
set :deploy_to, '/var/www/my_app'
set :linked_files, %w{config/database.yml config/secrets.yml}
set :linked_dirs, %w{log tmp/pids tmp/cache tmp/sockets vendor/bundle}

namespace :deploy do
  desc 'Run database migrations'
  task :migrate do
    on roles(:db) do
      within release_path do
        execute :rake, 'db:migrate RAILS_ENV=production'
      end
    end
  end
  
  desc 'Create deployment record'
  task :record_deployment do
    on roles(:app) do
      version = fetch(:current_revision)
      user = ENV['USER']
      timestamp = Time.now.utc.iso8601
      
      execute :echo, 
        "#{version},#{user},#{timestamp} >> #{deploy_to}/DEPLOYMENTS"
    end
  end
  
  after 'deploy:updated', 'deploy:migrate'
  after 'deploy:finished', 'deploy:record_deployment'
end

Change Validation includes pre-deployment checks that verify system readiness. Scripts test database connectivity, check dependency versions, validate configuration files, and ensure adequate disk space before proceeding with deployments.

class DeploymentValidator
  def validate!
    checks = [
      check_database_connection,
      check_dependency_versions,
      check_disk_space,
      check_configuration_files,
      check_environment_variables
    ]
    
    failures = checks.reject(&:passed?)
    
    if failures.any?
      raise DeploymentError, "Validation failed: #{failures.map(&:message)}"
    end
  end
  
  private
  
  def check_database_connection
    ActiveRecord::Base.connection.execute('SELECT 1')
    Check.new('database', true, 'Connection successful')
  rescue => e
    Check.new('database', false, e.message)
  end
  
  def check_dependency_versions
    outdated = `bundle outdated --strict`.split("\n")
    
    if outdated.empty?
      Check.new('dependencies', true, 'All dependencies current')
    else
      Check.new('dependencies', false, "Outdated: #{outdated.join(', ')}")
    end
  end
  
  def check_disk_space
    usage = `df -h /var/www | tail -1 | awk '{print $5}' | sed 's/%//'`.to_i
    
    if usage < 90
      Check.new('disk_space', true, "#{100 - usage}% available")
    else
      Check.new('disk_space', false, "Only #{100 - usage}% available")
    end
  end
end

Check = Struct.new(:name, :passed?, :message)

Rollback Automation provides quick recovery from failed deployments. Scripts store previous release information, maintain symbolic links to enable instant version switching, and preserve database states for migration rollback.

class RollbackManager
  def initialize(app_path)
    @app_path = app_path
    @releases_path = File.join(app_path, 'releases')
    @current_link = File.join(app_path, 'current')
  end
  
  def rollback
    previous_release = detect_previous_release
    
    unless previous_release
      raise "No previous release available"
    end
    
    puts "Rolling back to #{previous_release}"
    
    rollback_code(previous_release)
    rollback_database
    restart_services
    
    record_rollback(previous_release)
  end
  
  private
  
  def detect_previous_release
    releases = Dir.glob("#{@releases_path}/*").sort
    current = File.readlink(@current_link)
    current_index = releases.index(current)
    
    releases[current_index - 1] if current_index && current_index > 0
  end
  
  def rollback_code(release)
    File.unlink(@current_link)
    File.symlink(release, @current_link)
  end
  
  def rollback_database
    version = last_migration_before_deployment
    system("rake db:migrate:down VERSION=#{version}")
  end
  
  def restart_services
    system('systemctl restart puma')
    system('systemctl restart sidekiq')
  end
end

Tools & Ecosystem

Change management depends on an ecosystem of tools that handle different aspects of the change lifecycle.

Version Control Systems form the foundation. Git dominates with distributed architecture, branching flexibility, and strong community support. Alternatives like Mercurial offer simpler mental models but have smaller ecosystems. Centralized systems like Subversion still appear in enterprises with existing infrastructure.

Git Hosting Platforms add collaboration features to Git repositories. GitHub provides pull requests, code review, issues, and actions for CI/CD. GitLab integrates version control with CI/CD, container registries, and deployment management in a single platform. Bitbucket emphasizes Atlassian tool integration.

Continuous Integration tools automate testing and validation. GitHub Actions defines workflows in YAML within repositories. Jenkins provides plugin-based extensibility and on-premise deployment. CircleCI offers fast builds with strong Docker integration. Travis CI specializes in open source projects.

# GitHub Actions workflow for change validation
name: Validate Changes
on: [pull_request]

jobs:
  validate:
    runs-on: ubuntu-latest
    
    steps:
    - uses: actions/checkout@v2
    
    - name: Set up Ruby
      uses: ruby/setup-ruby@v1
      with:
        ruby-version: 3.2
        bundler-cache: true
    
    - name: Run tests
      run: bundle exec rspec
    
    - name: Check code coverage
      run: bundle exec rake coverage:check
    
    - name: Lint Ruby code
      run: bundle exec rubocop
    
    - name: Security audit
      run: bundle exec bundle-audit check --update

Deployment Tools coordinate production releases. Capistrano defines deployment tasks in Ruby DSL for SSH-based deployments to traditional servers. Kubernetes handles containerized applications with declarative configurations and rolling updates. Ansible uses playbooks to configure systems and deploy applications. Terraform manages infrastructure as code, tracking infrastructure changes like application changes.

Ruby Gems for Change Management include:

The octokit gem provides Ruby interface to GitHub API for automating repository operations, managing pull requests, and retrieving commit information programmatically.

require 'octokit'

client = Octokit::Client.new(access_token: ENV['GITHUB_TOKEN'])

# Get recent commits
commits = client.commits('username/repository', 'main')
commits.first(5).each do |commit|
  puts "#{commit.sha[0..7]} #{commit.commit.message}"
end

# Create release
client.create_release(
  'username/repository',
  'v2.3.1',
  name: 'Version 2.3.1',
  body: 'Bug fixes and performance improvements'
)

The changelog_manager gem generates changelogs from Git history, parsing commit messages and organizing changes by type. The semantic gem handles version number manipulation and comparison. The git gem wraps Git command-line operations in Ruby objects.

Configuration Management tools ensure environment consistency. Ansible playbooks define infrastructure state declaratively. Chef uses Ruby DSL for configuration recipes. Puppet provides declarative configuration language. These tools maintain environment parity crucial for reliable change promotion.

Monitoring and Observability tools detect problems from changes. Datadog collects metrics and traces across services. New Relic monitors application performance. Sentry captures errors with stack traces. Prometheus scrapes metrics endpoints for time-series data. These tools provide the signals that trigger rollbacks or halt rollouts.

Feature Flag Platforms manage gradual rollouts. LaunchDarkly provides hosted feature flag service with targeting rules and gradual rollouts. Flipper offers open-source feature flags for Ruby applications with multiple storage backends. Split.io adds experimentation frameworks to feature flags.

# Flipper gem for feature flags
require 'flipper'

Flipper.configure do |config|
  config.default do
    adapter = Flipper::Adapters::ActiveRecord.new
    Flipper.new(adapter)
  end
end

# Enable feature for percentage of users
Flipper.enable_percentage_of_actors(:new_dashboard, 25)

# Check in application code
if Flipper.enabled?(:new_dashboard, current_user)
  render 'dashboard/new'
else
  render 'dashboard/legacy'
end

Practical Examples

Real-world scenarios demonstrate how change management principles and tools combine to handle common situations.

Coordinating Multi-Team Feature Release

A large feature requires changes across three teams' services. The frontend team updates the UI. The backend team adds new API endpoints. The data team modifies the analytics pipeline. All changes must deploy simultaneously to avoid partial functionality.

The teams use feature flags to deploy code to production before activation. Each team deploys behind flags during their normal deployment windows over several days. When all teams confirm deployment, a coordinator enables the feature flags simultaneously. If problems arise, disabling flags reverts to previous behavior without redeployment.

# Multi-service feature flag coordination
class FeatureLaunch
  def initialize(feature_name)
    @feature = feature_name
    @services = ['frontend', 'backend', 'analytics']
  end
  
  def ready_to_launch?
    @services.all? { |service| service_deployed?(service) }
  end
  
  def launch
    unless ready_to_launch?
      raise "Not all services deployed #{@feature}"
    end
    
    @services.each do |service|
      enable_flag_for_service(service)
    end
    
    verify_launch
  end
  
  def rollback
    @services.each do |service|
      disable_flag_for_service(service)
    end
  end
  
  private
  
  def service_deployed?(service)
    response = HTTP.get("#{service_url(service)}/health")
    features = JSON.parse(response.body)['available_features']
    features.include?(@feature)
  end
  
  def enable_flag_for_service(service)
    HTTP.post(
      "#{service_url(service)}/admin/features/#{@feature}/enable",
      headers: { 'Authorization': admin_token }
    )
  end
end

Database Schema Migration with Zero Downtime

The application needs to rename a column from user_name to username across a large production database. Direct column rename would break running application instances during the brief deployment window.

The team implements a multi-phase migration. Phase 1 adds the new username column. Phase 2 deploys code that writes to both columns. Phase 3 backfills data from user_name to username. Phase 4 deploys code that reads from username. Phase 5 removes the user_name column. Each phase deploys independently with verification before proceeding.

# Phase 1: Add new column
class AddUsernameColumn < ActiveRecord::Migration[7.0]
  def change
    add_column :users, :username, :string
    add_index :users, :username
  end
end

# Phase 2: Update model to write both columns
class User < ApplicationRecord
  before_save :sync_username
  
  private
  
  def sync_username
    self.username = user_name if user_name_changed?
  end
end

# Phase 3: Backfill data
class BackfillUsername < ActiveRecord::Migration[7.0]
  def up
    User.where(username: nil).find_each do |user|
      user.update_column(:username, user.user_name)
    end
  end
end

# Phase 4: Update model to read from username
class User < ApplicationRecord
  def name
    username # Changed from user_name
  end
end

# Phase 5: Remove old column
class RemoveUserNameColumn < ActiveRecord::Migration[7.0]
  def change
    remove_column :users, :user_name
  end
end

Emergency Hotfix Deployment

A critical bug in production affects user authentication. The bug requires immediate fix outside the normal release schedule. The organization uses GitFlow with a two-week release cycle currently mid-cycle.

The developer creates a hotfix branch from the production tag, implements the minimal fix, writes tests, and requests emergency review. After approval, the fix merges to both main and develop branches. The hotfix deploys to production immediately. The next regular release includes the fix through the develop branch merge.

# Automated hotfix workflow
class HotfixWorkflow
  def initialize(bug_description)
    @bug = bug_description
    @git = Git.open('.')
  end
  
  def create_hotfix
    version = next_patch_version
    branch = "hotfix/#{version}"
    
    production_tag = get_production_tag
    @git.branch(branch).checkout
    @git.merge(production_tag)
    
    {
      branch: branch,
      version: version,
      instructions: <<~INSTRUCTIONS
        1. Implement fix in #{branch}
        2. Run: rake hotfix:test
        3. Run: rake hotfix:deploy version=#{version}
      INSTRUCTIONS
    }
  end
  
  def deploy_hotfix(version)
    validate_hotfix_branch(version)
    
    tag = "v#{version}"
    @git.add_tag(tag)
    
    # Merge to main
    @git.checkout('main')
    @git.merge("hotfix/#{version}")
    @git.push('origin', 'main')
    
    # Merge to develop
    @git.checkout('develop')
    @git.merge("hotfix/#{version}")
    @git.push('origin', 'develop')
    
    # Deploy
    system("cap production deploy TAG=#{tag}")
    
    # Cleanup
    @git.branch("hotfix/#{version}").delete
  end
  
  private
  
  def next_patch_version
    current = get_production_tag
    major, minor, patch = current.split('.').map(&:to_i)
    "#{major}.#{minor}.#{patch + 1}"
  end
end

Canary Deployment with Automatic Rollback

A new recommendation algorithm deploys to production. The team wants to verify it improves user engagement before full rollout. They deploy the algorithm to 5% of users initially.

Monitoring tracks error rates, response times, and engagement metrics for canary users versus control users. After six hours with healthy metrics, the rollout expands to 10%, then 25%, then 50%, then 100% over two days. If metrics degrade at any point, automatic rollback disables the canary version.

class CanaryRollout
  def initialize(feature_name)
    @feature = feature_name
    @stages = [5, 10, 25, 50, 100]
    @current_stage = 0
    @metrics = MetricsAnalyzer.new(feature_name)
  end
  
  def advance
    if @current_stage >= @stages.length
      complete_rollout
      return
    end
    
    percentage = @stages[@current_stage]
    
    if @metrics.healthy?(lookback_hours: 6)
      Flipper.enable_percentage_of_actors(@feature, percentage)
      @current_stage += 1
      schedule_next_advance
    else
      rollback("Unhealthy metrics detected")
    end
  end
  
  def rollback(reason)
    Flipper.disable(@feature)
    
    alert_team(
      feature: @feature,
      reason: reason,
      metrics: @metrics.report
    )
  end
  
  private
  
  def complete_rollout
    Flipper.enable(@feature)
    notify_success
  end
  
  def schedule_next_advance
    delay = case @current_stage
            when 1 then 6.hours
            when 2 then 12.hours
            when 3 then 24.hours
            else 48.hours
            end
    
    CanaryAdvanceJob.set(wait: delay).perform_later(@feature)
  end
end

class MetricsAnalyzer
  def healthy?(lookback_hours:)
    error_rate_acceptable? &&
      latency_acceptable? &&
      engagement_improved?
  end
  
  private
  
  def error_rate_acceptable?
    canary_errors = error_rate(:canary, lookback_hours)
    control_errors = error_rate(:control, lookback_hours)
    
    canary_errors <= control_errors * 1.1 # Allow 10% higher
  end
  
  def latency_acceptable?
    canary_p95 = latency_p95(:canary, lookback_hours)
    control_p95 = latency_p95(:control, lookback_hours)
    
    canary_p95 <= control_p95 * 1.2 # Allow 20% higher
  end
end

Error Handling & Edge Cases

Change management processes encounter various failure modes that require detection and recovery strategies.

Merge Conflicts occur when concurrent changes modify the same code sections. Automatic merges fail and require manual resolution. The developer must understand both changes, determine the correct integration, and verify the merged result works correctly.

Prevention strategies include frequent integration, small changes, and clear code ownership. When conflicts occur, tools like git mergetool provide visual interfaces for resolution. Teams should test after resolving conflicts since merged code may introduce bugs even if both original changes were correct.

# Automated merge conflict detection in CI
class MergeConflictChecker
  def check_pull_request(pr_number)
    pr = fetch_pull_request(pr_number)
    base_branch = pr.base_ref
    head_branch = pr.head_ref
    
    begin
      test_merge(base_branch, head_branch)
      { status: 'clean', conflicts: [] }
    rescue MergeConflict => e
      {
        status: 'conflicts',
        conflicts: e.conflicted_files,
        message: 'Resolve conflicts before merging'
      }
    end
  end
  
  private
  
  def test_merge(base, head)
    system("git fetch origin #{base} #{head}")
    result = system("git merge-tree $(git merge-base origin/#{base} origin/#{head}) origin/#{base} origin/#{head}")
    
    unless result
      conflicts = `git diff --name-only --diff-filter=U`.split("\n")
      raise MergeConflict.new(conflicts)
    end
  end
end

class MergeConflict < StandardError
  attr_reader :conflicted_files
  
  def initialize(files)
    @conflicted_files = files
    super("Merge conflicts in: #{files.join(', ')}")
  end
end

Failed Migrations can leave databases in inconsistent states. A migration might fail partway through, applying some changes but not others. Recovery requires determining what completed, rolling back partial changes, and fixing the migration.

Wrapping migrations in transactions ensures atomicity for databases that support transactional DDL. For databases without this support, migrations should include validation and rollback logic. Always test migrations on production-like data before deployment.

class SafeMigration < ActiveRecord::Migration[7.0]
  def up
    # Enable transaction for safety
    return unless transaction_open?
    
    begin
      add_column :orders, :tax_amount, :decimal, precision: 10, scale: 2
      add_column :orders, :tax_rate, :decimal, precision: 5, scale: 4
      
      # Validate migration before committing
      validate_columns_added
    rescue => e
      # Log detailed error for debugging
      Rails.logger.error("Migration failed: #{e.message}")
      Rails.logger.error(e.backtrace.join("\n"))
      
      # Raise to trigger rollback
      raise
    end
  end
  
  def down
    remove_column :orders, :tax_rate
    remove_column :orders, :tax_amount
  end
  
  private
  
  def validate_columns_added
    columns = ActiveRecord::Base.connection.columns(:orders).map(&:name)
    
    unless columns.include?('tax_amount') && columns.include?('tax_rate')
      raise "Columns not properly added"
    end
  end
end

Deployment Failures can occur at various stages. Code checkout might fail due to network issues. Dependency installation might fail due to missing packages. Service restart might fail due to configuration errors. Each failure point requires specific recovery.

Deployment scripts should validate preconditions before starting, maintain detailed logs, and preserve the previous working state. If deployment fails, automatic rollback restores the previous version. If rollback fails, the script should provide clear instructions for manual recovery.

Configuration Drift happens when production configurations diverge from version control. Manual changes in production for troubleshooting or emergency fixes may not get documented. Over time, the running configuration differs from the repository, causing confusion during deployments.

Configuration management tools enforce desired state by continuously monitoring and correcting drift. Infrastructure as code practices version all configuration changes. Change management processes require that emergency manual changes transfer to version control promptly.

class ConfigurationValidator
  def validate_production
    drifts = []
    
    configs = {
      'database.yml' => load_production_config('database.yml'),
      'redis.yml' => load_production_config('redis.yml'),
      'secrets.yml' => load_production_config('secrets.yml')
    }
    
    configs.each do |filename, production_config|
      repository_config = load_repository_config(filename)
      
      diff = compare_configs(production_config, repository_config)
      
      if diff.any?
        drifts << {
          file: filename,
          differences: diff
        }
      end
    end
    
    if drifts.any?
      report_drift(drifts)
    end
    
    drifts
  end
  
  private
  
  def compare_configs(production, repository)
    differences = []
    
    all_keys = (production.keys + repository.keys).uniq
    
    all_keys.each do |key|
      prod_value = production[key]
      repo_value = repository[key]
      
      if prod_value != repo_value
        differences << {
          key: key,
          production: prod_value,
          repository: repo_value
        }
      end
    end
    
    differences
  end
end

Race Conditions in Deployments occur when multiple deployments run simultaneously. Both might succeed individually but create inconsistent state when interleaved. File overwrites, database migration conflicts, and service restarts can interfere.

Deployment locking prevents concurrent deployments. A deployment acquires a lock before starting and releases it after completion. Subsequent deployments wait for the lock rather than proceeding simultaneously. The lock includes timeouts to prevent hung deployments from blocking indefinitely.

Rollback Complications arise when forward changes are not easily reversible. Database migrations that delete data cannot roll back without data loss. API changes that external clients depend on cannot revert without breaking integrations. Feature flags that users have adopted cannot simply disable without user impact.

Planning reversibility during initial design avoids these complications. Database migrations preserve data during transitions. API changes maintain backward compatibility. Feature changes degrade gracefully when disabled. For irreversible changes, teams must accept forward-only deployment with careful validation before release.

Reference

Change Management Workflow Comparison

Workflow	Branch Structure	Release Frequency	Merge Complexity	Best For
Trunk-Based	Single main branch	Multiple daily	Low	Fast-moving teams, mature CI/CD
GitFlow	Multiple long-lived branches	Weekly to monthly	High	Scheduled releases, multiple versions
GitHub Flow	Main plus feature branches	Multiple daily	Medium	Continuous deployment, web applications
Release Trains	Feature branches	Fixed schedule	Medium	Coordinated releases, enterprise

Deployment Strategy Comparison

Strategy	Downtime	Rollback Speed	Infrastructure Cost	Complexity
Blue-Green	None	Instant	High (2x)	Medium
Canary	None	Fast	Low	High
Rolling	None	Slow	Low	Low
Recreate	Brief	Fast	Low	Low

Common Git Commands for Change Management

Operation	Command	Purpose
Create feature branch	git checkout -b feature/name	Start isolated work
Update from main	git pull origin main	Sync with team changes
Interactive rebase	git rebase -i main	Clean commit history
Cherry-pick commit	git cherry-pick commit-hash	Apply specific change
Create tag	git tag -a v1.0.0 -m message	Mark release point
View commit history	git log --oneline --graph	Visualize branch structure
Stash changes	git stash save description	Temporarily save work
Amend last commit	git commit --amend	Fix recent commit

Migration Safety Checklist

Check	Verification	Risk if Skipped
Backup exists	Database dump available	Data loss
Transaction support	DDL in transaction	Partial application
Downtime acceptable	Scheduled maintenance window	User impact
Rollback tested	Down migration works	Stuck state
Production data tested	Test with prod-like data	Migration failure
Lock timeout set	statement_timeout configured	Hanging migration
Reversible operations	Data preserved	Cannot roll back

Feature Flag Configuration

Configuration	Values	Purpose
Rollout strategy	all, percentage, whitelist, gradual	Control exposure
Percentage	0-100	Canary rollout size
User targeting	user_id, attributes	Specific user access
Environment	development, staging, production	Environment-specific flags
Expiration	timestamp	Remove stale flags

Deployment Validation Checklist

Validation	Command	Expected Result
Database connection	rake db:migrate:status	All migrations up
Dependencies current	bundle check	All gems available
Tests passing	bundle exec rspec	Zero failures
Assets compiled	rake assets:precompile	No errors
Configuration valid	rake config:validate	All keys present
Services responding	curl health endpoint	200 status
Disk space adequate	df -h	Less than 90% usage

Semantic Versioning Rules

Version Component	Increment When	Example
Major (X.0.0)	Breaking changes	1.5.3 → 2.0.0
Minor (0.X.0)	New features, backward compatible	1.5.3 → 1.6.0
Patch (0.0.X)	Bug fixes, backward compatible	1.5.3 → 1.5.4

Rollback Decision Matrix

Metric	Threshold	Action
Error rate increase	Greater than 2x baseline	Immediate rollback
Latency degradation	P95 greater than 1.5x baseline	Monitor, prepare rollback
Traffic drop	Less than 50% expected	Immediate rollback
User reports	Greater than 10 per minute	Investigate, rollback if confirmed
Memory leak	Memory growth greater than 10MB/minute	Schedule rollback
Database errors	Any connection errors	Immediate rollback

Git Branch Naming Conventions

Branch Type	Pattern	Example
Feature	feature/description	feature/user-authentication
Bug fix	fix/description	fix/login-timeout
Hotfix	hotfix/version	hotfix/2.3.1
Release	release/version	release/2.4.0
Experiment	experiment/description	experiment/new-algorithm

Change Management Metrics

Metric	Calculation	Target
Deployment frequency	Deployments per day	Daily or higher
Lead time	Commit to production time	Less than 1 day
Change failure rate	Failed deployments / total	Less than 15%
Mean time to recovery	Time to restore service	Less than 1 hour
Rollback rate	Rollbacks / deployments	Less than 5%

Change Management