Overview
Change management encompasses the processes, tools, and methodologies that teams use to control and track modifications to software systems. The discipline addresses how code changes move from development through testing to production, how teams coordinate concurrent modifications, and how systems maintain stability while incorporating new features and fixes.
The foundation of software change management emerged from traditional engineering change control processes but evolved significantly with distributed version control systems. Modern change management integrates version control, continuous integration, deployment automation, and release coordination into cohesive workflows that balance velocity with reliability.
Change management operates at multiple levels within software organizations. At the code level, it tracks individual commits and branches. At the feature level, it coordinates work across multiple developers. At the release level, it manages deployment timing, rollback procedures, and production stability. Each level requires different tools and processes but all connect through shared principles of traceability, reversibility, and controlled progression.
The scope of change management extends beyond version control to include database migrations, infrastructure changes, configuration updates, and documentation modifications. Each type of change presents unique challenges. Code changes may introduce bugs. Database migrations may cause downtime. Infrastructure changes may affect performance. Configuration updates may create security vulnerabilities. A comprehensive change management system addresses all these change types through appropriate controls and validation processes.
# Simple change tracking in a deployment script
class ChangeTracker
def initialize(version)
@version = version
@timestamp = Time.now
@changes = []
end
def record_change(component, description)
@changes << {
component: component,
description: description,
timestamp: Time.now
}
end
def deployment_manifest
{
version: @version,
deployed_at: @timestamp,
changes: @changes,
rollback_version: previous_version
}
end
end
tracker = ChangeTracker.new("2.3.1")
tracker.record_change("authentication", "Add OAuth2 support")
tracker.record_change("database", "Add users.oauth_token column")
# => Creates deployment record with all changes
Key Principles
Change management rests on several fundamental principles that guide how organizations handle software modifications. These principles apply regardless of specific tools or processes.
Traceability requires that every change connects to a documented reason and responsible party. Each commit links to an issue or feature request. Each deployment references specific commits. Each rollback identifies the problematic change. This principle enables teams to answer questions like "Why did we make this change?" and "Who approved this modification?" without archaeological code analysis.
Atomicity demands that changes group into logical, indivisible units. A feature implementation includes all necessary code, tests, documentation, and database migrations. Deploying half a feature or leaving migrations unrun creates inconsistent states. Atomic changes either complete entirely or fail entirely, preventing partially-applied modifications that corrupt system state.
Reversibility ensures that teams can undo changes when problems arise. Every deployment includes a rollback plan. Every database migration includes a down migration. Every configuration change preserves the previous configuration. This principle acknowledges that despite testing, some problems only manifest in production. Quick reversal limits damage and provides time for proper fixes.
Progressive Exposure controls how changes reach users. New features may deploy first to development environments, then staging, then a small percentage of production users, and finally to all users. This gradual exposure detects problems with limited impact. Each stage provides opportunities to identify issues before full deployment.
Isolation separates concurrent changes to prevent interference. Developers work in feature branches rather than directly on main branches. Each change includes its own tests that run independently. Deployment processes handle one change at a time rather than batching unrelated modifications. Isolation enables parallel work without coordination overhead.
Auditability maintains comprehensive records of all changes. Logs capture who made each change, when it occurred, what specifically changed, and why. These records support compliance requirements, security investigations, and debugging. Audit trails must be immutable and complete to serve their purpose.
The relationship between these principles creates tensions that teams must balance. Atomicity suggests larger changes, while progressive exposure suggests smaller ones. Isolation enables parallel work but increases merge complexity. Reversibility requires additional work that seems wasteful when changes succeed. Change management processes navigate these tensions based on organizational risk tolerance and operational constraints.
# Demonstrating atomic change with transaction
class FeatureDeployment
def deploy(feature_name)
ActiveRecord::Base.transaction do
enable_feature(feature_name)
run_data_migration(feature_name)
update_configuration(feature_name)
notify_monitoring(feature_name)
end
rescue => e
log_failure(feature_name, e)
raise # Ensures all-or-nothing deployment
end
private
def enable_feature(name)
Feature.create!(name: name, enabled: true)
end
def run_data_migration(name)
DataMigration.execute(name)
end
def update_configuration(name)
Config.set("feature.#{name}.enabled", true)
end
def notify_monitoring(name)
Monitoring.track_deployment(name)
end
end
Implementation Approaches
Organizations implement change management through various approaches that differ in structure, tooling, and coordination mechanisms. The choice depends on team size, release frequency, risk tolerance, and regulatory requirements.
Trunk-Based Development maintains a single main branch where all developers commit frequently. Feature flags control which functionality appears in production. This approach minimizes merge complexity because developers integrate changes continuously rather than in large batches. Teams using trunk-based development typically release multiple times per day.
The core workflow involves developers pulling the latest main branch, making small changes, running automated tests, and pushing directly to main. Feature flags wrap incomplete features, allowing code deployment without feature activation. When features complete, teams enable flags rather than merging large branches.
# Feature flag implementation for trunk-based development
class FeatureFlag
def self.enabled?(feature_name, user: nil)
flag = Flag.find_by(name: feature_name)
return false unless flag
case flag.rollout_strategy
when 'all'
flag.enabled
when 'percentage'
user && user_in_rollout_percentage?(user, flag.percentage)
when 'whitelist'
user && flag.whitelisted_users.include?(user.id)
else
false
end
end
private
def self.user_in_rollout_percentage?(user, percentage)
(user.id % 100) < percentage
end
end
# Usage in application code
if FeatureFlag.enabled?('new_checkout_flow', user: current_user)
render 'checkout/new_flow'
else
render 'checkout/legacy_flow'
end
GitFlow structures work around multiple long-lived branches with specific purposes. The main branch holds production code. The develop branch integrates features. Feature branches isolate individual work items. Release branches prepare for production deployment. Hotfix branches address production issues.
This approach provides clear separation between production, integration, and development states. Teams can prepare releases while continuing development work. However, GitFlow creates merge overhead and delays integration, which can lead to conflicts and integration surprises.
GitHub Flow simplifies GitFlow by maintaining only a main branch and short-lived feature branches. Developers create branches for features, open pull requests for review, merge to main after approval, and deploy immediately. This approach balances simplicity with code review while maintaining a deployable main branch.
Release Trains schedule deployments at fixed intervals regardless of feature readiness. Features that complete before the departure time board the train. Incomplete features wait for the next train. This approach creates predictable release schedules that coordinate across teams and stakeholders. However, it can delay feature delivery and create pressure to rush changes before train departure.
# Release train coordination script
class ReleaseTrain
def initialize(departure_time)
@departure_time = departure_time
@features = []
end
def board_feature(feature)
if feature.ready? && Time.now < @departure_time
@features << feature
tag_for_release(feature)
else
schedule_next_train(feature)
end
end
def depart
return unless Time.now >= @departure_time
@features.each do |feature|
deploy_feature(feature)
end
create_release_notes
notify_stakeholders
end
private
def tag_for_release(feature)
feature.update(release_tag: next_release_version)
end
def schedule_next_train(feature)
next_train = @departure_time + 2.weeks
feature.update(scheduled_release: next_train)
end
end
Environment Promotion moves changes through a series of environments that increasingly resemble production. A change might progress through development, integration, staging, pre-production, and production environments. Each environment provides validation opportunities with different characteristics. Early environments enable fast feedback. Later environments provide production-like validation.
The implementation requires maintaining environment parity so that validation in staging accurately predicts production behavior. Configuration management systems ensure consistency while allowing environment-specific settings like database connections or API endpoints.
Common Patterns
Several patterns recur across change management implementations, addressing common challenges in coordinating modifications and maintaining system stability.
Pull Request Workflow gates changes behind review and approval before integration. Developers push branches to shared repositories, open pull requests describing changes, address review feedback, and merge after approval. This pattern enforces code review, prevents direct commits to protected branches, and creates discussion threads documenting decisions.
The pattern integrates with automated checks that must pass before merge approval. Continuous integration runs tests, linters verify code style, security scanners detect vulnerabilities, and coverage tools ensure adequate testing. Pull requests block merging until all checks succeed and required reviewers approve.
# Automated pull request validation
class PullRequestValidator
def initialize(pr_number)
@pr = fetch_pull_request(pr_number)
@checks = []
end
def validate
run_test_suite
check_code_coverage
scan_dependencies
verify_migrations
lint_code_style
@checks.all?(&:passed?)
end
private
def run_test_suite
result = system('bundle exec rspec')
@checks << Check.new('tests', result)
end
def check_code_coverage
coverage = SimpleCov.result.covered_percent
passed = coverage >= 80.0
@checks << Check.new('coverage', passed)
end
def scan_dependencies
result = system('bundle audit check --update')
@checks << Check.new('security', result)
end
def verify_migrations
pending = ActiveRecord::Migration.check_pending!
@checks << Check.new('migrations', pending.nil?)
rescue ActiveRecord::PendingMigrationError
@checks << Check.new('migrations', false)
end
end
Blue-Green Deployment maintains two production environments that alternate between active and idle. The blue environment serves live traffic while the green environment receives the new deployment. After validating the green environment, traffic switches from blue to green. If problems arise, traffic switches back to blue.
This pattern enables zero-downtime deployments and instant rollback. The idle environment provides a production-equivalent testing ground. However, maintaining two full production environments doubles infrastructure costs, and some systems (like databases) cannot fully duplicate without complex replication.
Canary Deployment releases changes to a small subset of users before full rollout. If metrics show problems, the deployment halts and rolls back. If metrics remain healthy, the rollout expands gradually. This pattern detects issues with limited impact and provides early warning of problems that testing missed.
Implementation requires routing logic that directs specific users to canary versions, monitoring that detects anomalies, and automated or manual decisions about rollout progression. The pattern works well for user-facing changes but applies poorly to backend services without user-specific routing.
# Canary deployment controller
class CanaryDeployment
def initialize(version, initial_percentage: 5)
@version = version
@percentage = initial_percentage
@metrics = MetricsCollector.new(version)
end
def route_request(user)
if canary_user?(user)
@metrics.record_canary_request
"canary_#{@version}"
else
@metrics.record_stable_request
"stable"
end
end
def expand_rollout
return if @percentage >= 100
if @metrics.healthy?
@percentage = [@percentage * 2, 100].min
record_expansion
else
rollback
end
end
private
def canary_user?(user)
(user.id % 100) < @percentage
end
def rollback
@percentage = 0
alert_team("Canary rollback: #{@version}")
end
end
Database Migration Patterns address the challenge of changing database schemas in production systems. Forward-only migrations avoid down migrations that risk data loss. Backward-compatible migrations ensure code works with both old and new schemas during deployment transitions. Multi-step migrations separate schema changes from code changes to prevent breaking running code.
A typical multi-step migration adds a new column while maintaining the old column, deploys code that writes to both columns, backfills data to the new column, deploys code that reads from the new column, and finally removes the old column. Each step deploys independently and maintains system functionality.
Changelog Automation generates release notes from commit messages or pull request descriptions. Tools parse structured commit messages following conventions like Conventional Commits, extract feature descriptions, bug fixes, and breaking changes, and produce formatted changelogs. This pattern ensures documentation stays current without manual effort but requires consistent commit message discipline.
Ruby Implementation
Ruby provides several tools and libraries that implement change management automation, from version bumping to deployment coordination.
Version Management typically uses semantic versioning (major.minor.patch) stored in version files or constants. The bump gem automates version incrementation based on change type. Rake tasks integrate version management into development workflows.
# Version management module
module AppVersion
MAJOR = 2
MINOR = 4
PATCH = 7
def self.to_s
"#{MAJOR}.#{MINOR}.#{PATCH}"
end
def self.bump(type)
version_file = File.read('lib/app_version.rb')
case type
when :major
version_file.gsub!(/MAJOR = \d+/, "MAJOR = #{MAJOR + 1}")
version_file.gsub!(/MINOR = \d+/, "MINOR = 0")
version_file.gsub!(/PATCH = \d+/, "PATCH = 0")
when :minor
version_file.gsub!(/MINOR = \d+/, "MINOR = #{MINOR + 1}")
version_file.gsub!(/PATCH = \d+/, "PATCH = 0")
when :patch
version_file.gsub!(/PATCH = \d+/, "PATCH = #{PATCH + 1}")
end
File.write('lib/app_version.rb', version_file)
end
end
# Rake task for version bumping
namespace :version do
desc 'Bump major version'
task :major do
AppVersion.bump(:major)
sh "git commit -am 'Bump version to #{AppVersion}'"
sh "git tag v#{AppVersion}"
end
end
Git Integration through the ruby-git gem enables automated Git operations within Ruby scripts. Deployment scripts can check branch status, create tags, and push changes programmatically.
require 'git'
class DeploymentManager
def initialize(repo_path)
@git = Git.open(repo_path)
end
def prepare_release(version)
ensure_clean_working_tree
checkout_main
pull_latest
create_release_tag(version)
end
private
def ensure_clean_working_tree
if @git.status.changed.any?
raise "Uncommitted changes present"
end
end
def checkout_main
@git.checkout('main')
end
def pull_latest
@git.pull('origin', 'main')
end
def create_release_tag(version)
@git.add_tag("v#{version}")
@git.push('origin', "v#{version}")
end
def rollback_to_version(version)
@git.checkout("v#{version}")
end
end
Deployment Scripts coordinate the steps required to deploy changes. Capistrano remains the standard Ruby deployment tool, providing tasks for code checkout, dependency installation, asset compilation, database migration, and service restart.
# Capistrano deployment configuration
set :application, 'my_app'
set :repo_url, 'git@github.com:username/my_app.git'
set :deploy_to, '/var/www/my_app'
set :linked_files, %w{config/database.yml config/secrets.yml}
set :linked_dirs, %w{log tmp/pids tmp/cache tmp/sockets vendor/bundle}
namespace :deploy do
desc 'Run database migrations'
task :migrate do
on roles(:db) do
within release_path do
execute :rake, 'db:migrate RAILS_ENV=production'
end
end
end
desc 'Create deployment record'
task :record_deployment do
on roles(:app) do
version = fetch(:current_revision)
user = ENV['USER']
timestamp = Time.now.utc.iso8601
execute :echo,
"#{version},#{user},#{timestamp} >> #{deploy_to}/DEPLOYMENTS"
end
end
after 'deploy:updated', 'deploy:migrate'
after 'deploy:finished', 'deploy:record_deployment'
end
Change Validation includes pre-deployment checks that verify system readiness. Scripts test database connectivity, check dependency versions, validate configuration files, and ensure adequate disk space before proceeding with deployments.
class DeploymentValidator
def validate!
checks = [
check_database_connection,
check_dependency_versions,
check_disk_space,
check_configuration_files,
check_environment_variables
]
failures = checks.reject(&:passed?)
if failures.any?
raise DeploymentError, "Validation failed: #{failures.map(&:message)}"
end
end
private
def check_database_connection
ActiveRecord::Base.connection.execute('SELECT 1')
Check.new('database', true, 'Connection successful')
rescue => e
Check.new('database', false, e.message)
end
def check_dependency_versions
outdated = `bundle outdated --strict`.split("\n")
if outdated.empty?
Check.new('dependencies', true, 'All dependencies current')
else
Check.new('dependencies', false, "Outdated: #{outdated.join(', ')}")
end
end
def check_disk_space
usage = `df -h /var/www | tail -1 | awk '{print $5}' | sed 's/%//'`.to_i
if usage < 90
Check.new('disk_space', true, "#{100 - usage}% available")
else
Check.new('disk_space', false, "Only #{100 - usage}% available")
end
end
end
Check = Struct.new(:name, :passed?, :message)
Rollback Automation provides quick recovery from failed deployments. Scripts store previous release information, maintain symbolic links to enable instant version switching, and preserve database states for migration rollback.
class RollbackManager
def initialize(app_path)
@app_path = app_path
@releases_path = File.join(app_path, 'releases')
@current_link = File.join(app_path, 'current')
end
def rollback
previous_release = detect_previous_release
unless previous_release
raise "No previous release available"
end
puts "Rolling back to #{previous_release}"
rollback_code(previous_release)
rollback_database
restart_services
record_rollback(previous_release)
end
private
def detect_previous_release
releases = Dir.glob("#{@releases_path}/*").sort
current = File.readlink(@current_link)
current_index = releases.index(current)
releases[current_index - 1] if current_index && current_index > 0
end
def rollback_code(release)
File.unlink(@current_link)
File.symlink(release, @current_link)
end
def rollback_database
version = last_migration_before_deployment
system("rake db:migrate:down VERSION=#{version}")
end
def restart_services
system('systemctl restart puma')
system('systemctl restart sidekiq')
end
end
Tools & Ecosystem
Change management depends on an ecosystem of tools that handle different aspects of the change lifecycle.
Version Control Systems form the foundation. Git dominates with distributed architecture, branching flexibility, and strong community support. Alternatives like Mercurial offer simpler mental models but have smaller ecosystems. Centralized systems like Subversion still appear in enterprises with existing infrastructure.
Git Hosting Platforms add collaboration features to Git repositories. GitHub provides pull requests, code review, issues, and actions for CI/CD. GitLab integrates version control with CI/CD, container registries, and deployment management in a single platform. Bitbucket emphasizes Atlassian tool integration.
Continuous Integration tools automate testing and validation. GitHub Actions defines workflows in YAML within repositories. Jenkins provides plugin-based extensibility and on-premise deployment. CircleCI offers fast builds with strong Docker integration. Travis CI specializes in open source projects.
# GitHub Actions workflow for change validation
name: Validate Changes
on: [pull_request]
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Ruby
uses: ruby/setup-ruby@v1
with:
ruby-version: 3.2
bundler-cache: true
- name: Run tests
run: bundle exec rspec
- name: Check code coverage
run: bundle exec rake coverage:check
- name: Lint Ruby code
run: bundle exec rubocop
- name: Security audit
run: bundle exec bundle-audit check --update
Deployment Tools coordinate production releases. Capistrano defines deployment tasks in Ruby DSL for SSH-based deployments to traditional servers. Kubernetes handles containerized applications with declarative configurations and rolling updates. Ansible uses playbooks to configure systems and deploy applications. Terraform manages infrastructure as code, tracking infrastructure changes like application changes.
Ruby Gems for Change Management include:
The octokit gem provides Ruby interface to GitHub API for automating repository operations, managing pull requests, and retrieving commit information programmatically.
require 'octokit'
client = Octokit::Client.new(access_token: ENV['GITHUB_TOKEN'])
# Get recent commits
commits = client.commits('username/repository', 'main')
commits.first(5).each do |commit|
puts "#{commit.sha[0..7]} #{commit.commit.message}"
end
# Create release
client.create_release(
'username/repository',
'v2.3.1',
name: 'Version 2.3.1',
body: 'Bug fixes and performance improvements'
)
The changelog_manager gem generates changelogs from Git history, parsing commit messages and organizing changes by type. The semantic gem handles version number manipulation and comparison. The git gem wraps Git command-line operations in Ruby objects.
Configuration Management tools ensure environment consistency. Ansible playbooks define infrastructure state declaratively. Chef uses Ruby DSL for configuration recipes. Puppet provides declarative configuration language. These tools maintain environment parity crucial for reliable change promotion.
Monitoring and Observability tools detect problems from changes. Datadog collects metrics and traces across services. New Relic monitors application performance. Sentry captures errors with stack traces. Prometheus scrapes metrics endpoints for time-series data. These tools provide the signals that trigger rollbacks or halt rollouts.
Feature Flag Platforms manage gradual rollouts. LaunchDarkly provides hosted feature flag service with targeting rules and gradual rollouts. Flipper offers open-source feature flags for Ruby applications with multiple storage backends. Split.io adds experimentation frameworks to feature flags.
# Flipper gem for feature flags
require 'flipper'
Flipper.configure do |config|
config.default do
adapter = Flipper::Adapters::ActiveRecord.new
Flipper.new(adapter)
end
end
# Enable feature for percentage of users
Flipper.enable_percentage_of_actors(:new_dashboard, 25)
# Check in application code
if Flipper.enabled?(:new_dashboard, current_user)
render 'dashboard/new'
else
render 'dashboard/legacy'
end
Practical Examples
Real-world scenarios demonstrate how change management principles and tools combine to handle common situations.
Coordinating Multi-Team Feature Release
A large feature requires changes across three teams' services. The frontend team updates the UI. The backend team adds new API endpoints. The data team modifies the analytics pipeline. All changes must deploy simultaneously to avoid partial functionality.
The teams use feature flags to deploy code to production before activation. Each team deploys behind flags during their normal deployment windows over several days. When all teams confirm deployment, a coordinator enables the feature flags simultaneously. If problems arise, disabling flags reverts to previous behavior without redeployment.
# Multi-service feature flag coordination
class FeatureLaunch
def initialize(feature_name)
@feature = feature_name
@services = ['frontend', 'backend', 'analytics']
end
def ready_to_launch?
@services.all? { |service| service_deployed?(service) }
end
def launch
unless ready_to_launch?
raise "Not all services deployed #{@feature}"
end
@services.each do |service|
enable_flag_for_service(service)
end
verify_launch
end
def rollback
@services.each do |service|
disable_flag_for_service(service)
end
end
private
def service_deployed?(service)
response = HTTP.get("#{service_url(service)}/health")
features = JSON.parse(response.body)['available_features']
features.include?(@feature)
end
def enable_flag_for_service(service)
HTTP.post(
"#{service_url(service)}/admin/features/#{@feature}/enable",
headers: { 'Authorization': admin_token }
)
end
end
Database Schema Migration with Zero Downtime
The application needs to rename a column from user_name to username across a large production database. Direct column rename would break running application instances during the brief deployment window.
The team implements a multi-phase migration. Phase 1 adds the new username column. Phase 2 deploys code that writes to both columns. Phase 3 backfills data from user_name to username. Phase 4 deploys code that reads from username. Phase 5 removes the user_name column. Each phase deploys independently with verification before proceeding.
# Phase 1: Add new column
class AddUsernameColumn < ActiveRecord::Migration[7.0]
def change
add_column :users, :username, :string
add_index :users, :username
end
end
# Phase 2: Update model to write both columns
class User < ApplicationRecord
before_save :sync_username
private
def sync_username
self.username = user_name if user_name_changed?
end
end
# Phase 3: Backfill data
class BackfillUsername < ActiveRecord::Migration[7.0]
def up
User.where(username: nil).find_each do |user|
user.update_column(:username, user.user_name)
end
end
end
# Phase 4: Update model to read from username
class User < ApplicationRecord
def name
username # Changed from user_name
end
end
# Phase 5: Remove old column
class RemoveUserNameColumn < ActiveRecord::Migration[7.0]
def change
remove_column :users, :user_name
end
end
Emergency Hotfix Deployment
A critical bug in production affects user authentication. The bug requires immediate fix outside the normal release schedule. The organization uses GitFlow with a two-week release cycle currently mid-cycle.
The developer creates a hotfix branch from the production tag, implements the minimal fix, writes tests, and requests emergency review. After approval, the fix merges to both main and develop branches. The hotfix deploys to production immediately. The next regular release includes the fix through the develop branch merge.
# Automated hotfix workflow
class HotfixWorkflow
def initialize(bug_description)
@bug = bug_description
@git = Git.open('.')
end
def create_hotfix
version = next_patch_version
branch = "hotfix/#{version}"
production_tag = get_production_tag
@git.branch(branch).checkout
@git.merge(production_tag)
{
branch: branch,
version: version,
instructions: <<~INSTRUCTIONS
1. Implement fix in #{branch}
2. Run: rake hotfix:test
3. Run: rake hotfix:deploy version=#{version}
INSTRUCTIONS
}
end
def deploy_hotfix(version)
validate_hotfix_branch(version)
tag = "v#{version}"
@git.add_tag(tag)
# Merge to main
@git.checkout('main')
@git.merge("hotfix/#{version}")
@git.push('origin', 'main')
# Merge to develop
@git.checkout('develop')
@git.merge("hotfix/#{version}")
@git.push('origin', 'develop')
# Deploy
system("cap production deploy TAG=#{tag}")
# Cleanup
@git.branch("hotfix/#{version}").delete
end
private
def next_patch_version
current = get_production_tag
major, minor, patch = current.split('.').map(&:to_i)
"#{major}.#{minor}.#{patch + 1}"
end
end
Canary Deployment with Automatic Rollback
A new recommendation algorithm deploys to production. The team wants to verify it improves user engagement before full rollout. They deploy the algorithm to 5% of users initially.
Monitoring tracks error rates, response times, and engagement metrics for canary users versus control users. After six hours with healthy metrics, the rollout expands to 10%, then 25%, then 50%, then 100% over two days. If metrics degrade at any point, automatic rollback disables the canary version.
class CanaryRollout
def initialize(feature_name)
@feature = feature_name
@stages = [5, 10, 25, 50, 100]
@current_stage = 0
@metrics = MetricsAnalyzer.new(feature_name)
end
def advance
if @current_stage >= @stages.length
complete_rollout
return
end
percentage = @stages[@current_stage]
if @metrics.healthy?(lookback_hours: 6)
Flipper.enable_percentage_of_actors(@feature, percentage)
@current_stage += 1
schedule_next_advance
else
rollback("Unhealthy metrics detected")
end
end
def rollback(reason)
Flipper.disable(@feature)
alert_team(
feature: @feature,
reason: reason,
metrics: @metrics.report
)
end
private
def complete_rollout
Flipper.enable(@feature)
notify_success
end
def schedule_next_advance
delay = case @current_stage
when 1 then 6.hours
when 2 then 12.hours
when 3 then 24.hours
else 48.hours
end
CanaryAdvanceJob.set(wait: delay).perform_later(@feature)
end
end
class MetricsAnalyzer
def healthy?(lookback_hours:)
error_rate_acceptable? &&
latency_acceptable? &&
engagement_improved?
end
private
def error_rate_acceptable?
canary_errors = error_rate(:canary, lookback_hours)
control_errors = error_rate(:control, lookback_hours)
canary_errors <= control_errors * 1.1 # Allow 10% higher
end
def latency_acceptable?
canary_p95 = latency_p95(:canary, lookback_hours)
control_p95 = latency_p95(:control, lookback_hours)
canary_p95 <= control_p95 * 1.2 # Allow 20% higher
end
end
Error Handling & Edge Cases
Change management processes encounter various failure modes that require detection and recovery strategies.
Merge Conflicts occur when concurrent changes modify the same code sections. Automatic merges fail and require manual resolution. The developer must understand both changes, determine the correct integration, and verify the merged result works correctly.
Prevention strategies include frequent integration, small changes, and clear code ownership. When conflicts occur, tools like git mergetool provide visual interfaces for resolution. Teams should test after resolving conflicts since merged code may introduce bugs even if both original changes were correct.
# Automated merge conflict detection in CI
class MergeConflictChecker
def check_pull_request(pr_number)
pr = fetch_pull_request(pr_number)
base_branch = pr.base_ref
head_branch = pr.head_ref
begin
test_merge(base_branch, head_branch)
{ status: 'clean', conflicts: [] }
rescue MergeConflict => e
{
status: 'conflicts',
conflicts: e.conflicted_files,
message: 'Resolve conflicts before merging'
}
end
end
private
def test_merge(base, head)
system("git fetch origin #{base} #{head}")
result = system("git merge-tree $(git merge-base origin/#{base} origin/#{head}) origin/#{base} origin/#{head}")
unless result
conflicts = `git diff --name-only --diff-filter=U`.split("\n")
raise MergeConflict.new(conflicts)
end
end
end
class MergeConflict < StandardError
attr_reader :conflicted_files
def initialize(files)
@conflicted_files = files
super("Merge conflicts in: #{files.join(', ')}")
end
end
Failed Migrations can leave databases in inconsistent states. A migration might fail partway through, applying some changes but not others. Recovery requires determining what completed, rolling back partial changes, and fixing the migration.
Wrapping migrations in transactions ensures atomicity for databases that support transactional DDL. For databases without this support, migrations should include validation and rollback logic. Always test migrations on production-like data before deployment.
class SafeMigration < ActiveRecord::Migration[7.0]
def up
# Enable transaction for safety
return unless transaction_open?
begin
add_column :orders, :tax_amount, :decimal, precision: 10, scale: 2
add_column :orders, :tax_rate, :decimal, precision: 5, scale: 4
# Validate migration before committing
validate_columns_added
rescue => e
# Log detailed error for debugging
Rails.logger.error("Migration failed: #{e.message}")
Rails.logger.error(e.backtrace.join("\n"))
# Raise to trigger rollback
raise
end
end
def down
remove_column :orders, :tax_rate
remove_column :orders, :tax_amount
end
private
def validate_columns_added
columns = ActiveRecord::Base.connection.columns(:orders).map(&:name)
unless columns.include?('tax_amount') && columns.include?('tax_rate')
raise "Columns not properly added"
end
end
end
Deployment Failures can occur at various stages. Code checkout might fail due to network issues. Dependency installation might fail due to missing packages. Service restart might fail due to configuration errors. Each failure point requires specific recovery.
Deployment scripts should validate preconditions before starting, maintain detailed logs, and preserve the previous working state. If deployment fails, automatic rollback restores the previous version. If rollback fails, the script should provide clear instructions for manual recovery.
Configuration Drift happens when production configurations diverge from version control. Manual changes in production for troubleshooting or emergency fixes may not get documented. Over time, the running configuration differs from the repository, causing confusion during deployments.
Configuration management tools enforce desired state by continuously monitoring and correcting drift. Infrastructure as code practices version all configuration changes. Change management processes require that emergency manual changes transfer to version control promptly.
class ConfigurationValidator
def validate_production
drifts = []
configs = {
'database.yml' => load_production_config('database.yml'),
'redis.yml' => load_production_config('redis.yml'),
'secrets.yml' => load_production_config('secrets.yml')
}
configs.each do |filename, production_config|
repository_config = load_repository_config(filename)
diff = compare_configs(production_config, repository_config)
if diff.any?
drifts << {
file: filename,
differences: diff
}
end
end
if drifts.any?
report_drift(drifts)
end
drifts
end
private
def compare_configs(production, repository)
differences = []
all_keys = (production.keys + repository.keys).uniq
all_keys.each do |key|
prod_value = production[key]
repo_value = repository[key]
if prod_value != repo_value
differences << {
key: key,
production: prod_value,
repository: repo_value
}
end
end
differences
end
end
Race Conditions in Deployments occur when multiple deployments run simultaneously. Both might succeed individually but create inconsistent state when interleaved. File overwrites, database migration conflicts, and service restarts can interfere.
Deployment locking prevents concurrent deployments. A deployment acquires a lock before starting and releases it after completion. Subsequent deployments wait for the lock rather than proceeding simultaneously. The lock includes timeouts to prevent hung deployments from blocking indefinitely.
Rollback Complications arise when forward changes are not easily reversible. Database migrations that delete data cannot roll back without data loss. API changes that external clients depend on cannot revert without breaking integrations. Feature flags that users have adopted cannot simply disable without user impact.
Planning reversibility during initial design avoids these complications. Database migrations preserve data during transitions. API changes maintain backward compatibility. Feature changes degrade gracefully when disabled. For irreversible changes, teams must accept forward-only deployment with careful validation before release.
Reference
Change Management Workflow Comparison
| Workflow | Branch Structure | Release Frequency | Merge Complexity | Best For |
|---|---|---|---|---|
| Trunk-Based | Single main branch | Multiple daily | Low | Fast-moving teams, mature CI/CD |
| GitFlow | Multiple long-lived branches | Weekly to monthly | High | Scheduled releases, multiple versions |
| GitHub Flow | Main plus feature branches | Multiple daily | Medium | Continuous deployment, web applications |
| Release Trains | Feature branches | Fixed schedule | Medium | Coordinated releases, enterprise |
Deployment Strategy Comparison
| Strategy | Downtime | Rollback Speed | Infrastructure Cost | Complexity |
|---|---|---|---|---|
| Blue-Green | None | Instant | High (2x) | Medium |
| Canary | None | Fast | Low | High |
| Rolling | None | Slow | Low | Low |
| Recreate | Brief | Fast | Low | Low |
Common Git Commands for Change Management
| Operation | Command | Purpose |
|---|---|---|
| Create feature branch | git checkout -b feature/name | Start isolated work |
| Update from main | git pull origin main | Sync with team changes |
| Interactive rebase | git rebase -i main | Clean commit history |
| Cherry-pick commit | git cherry-pick commit-hash | Apply specific change |
| Create tag | git tag -a v1.0.0 -m message | Mark release point |
| View commit history | git log --oneline --graph | Visualize branch structure |
| Stash changes | git stash save description | Temporarily save work |
| Amend last commit | git commit --amend | Fix recent commit |
Migration Safety Checklist
| Check | Verification | Risk if Skipped |
|---|---|---|
| Backup exists | Database dump available | Data loss |
| Transaction support | DDL in transaction | Partial application |
| Downtime acceptable | Scheduled maintenance window | User impact |
| Rollback tested | Down migration works | Stuck state |
| Production data tested | Test with prod-like data | Migration failure |
| Lock timeout set | statement_timeout configured | Hanging migration |
| Reversible operations | Data preserved | Cannot roll back |
Feature Flag Configuration
| Configuration | Values | Purpose |
|---|---|---|
| Rollout strategy | all, percentage, whitelist, gradual | Control exposure |
| Percentage | 0-100 | Canary rollout size |
| User targeting | user_id, attributes | Specific user access |
| Environment | development, staging, production | Environment-specific flags |
| Expiration | timestamp | Remove stale flags |
Deployment Validation Checklist
| Validation | Command | Expected Result |
|---|---|---|
| Database connection | rake db:migrate:status | All migrations up |
| Dependencies current | bundle check | All gems available |
| Tests passing | bundle exec rspec | Zero failures |
| Assets compiled | rake assets:precompile | No errors |
| Configuration valid | rake config:validate | All keys present |
| Services responding | curl health endpoint | 200 status |
| Disk space adequate | df -h | Less than 90% usage |
Semantic Versioning Rules
| Version Component | Increment When | Example |
|---|---|---|
| Major (X.0.0) | Breaking changes | 1.5.3 → 2.0.0 |
| Minor (0.X.0) | New features, backward compatible | 1.5.3 → 1.6.0 |
| Patch (0.0.X) | Bug fixes, backward compatible | 1.5.3 → 1.5.4 |
Rollback Decision Matrix
| Metric | Threshold | Action |
|---|---|---|
| Error rate increase | Greater than 2x baseline | Immediate rollback |
| Latency degradation | P95 greater than 1.5x baseline | Monitor, prepare rollback |
| Traffic drop | Less than 50% expected | Immediate rollback |
| User reports | Greater than 10 per minute | Investigate, rollback if confirmed |
| Memory leak | Memory growth greater than 10MB/minute | Schedule rollback |
| Database errors | Any connection errors | Immediate rollback |
Git Branch Naming Conventions
| Branch Type | Pattern | Example |
|---|---|---|
| Feature | feature/description | feature/user-authentication |
| Bug fix | fix/description | fix/login-timeout |
| Hotfix | hotfix/version | hotfix/2.3.1 |
| Release | release/version | release/2.4.0 |
| Experiment | experiment/description | experiment/new-algorithm |
Change Management Metrics
| Metric | Calculation | Target |
|---|---|---|
| Deployment frequency | Deployments per day | Daily or higher |
| Lead time | Commit to production time | Less than 1 day |
| Change failure rate | Failed deployments / total | Less than 15% |
| Mean time to recovery | Time to restore service | Less than 1 hour |
| Rollback rate | Rollbacks / deployments | Less than 5% |