CrackedRuby CrackedRuby

Backup and Recovery Strategies

Overview

Backup and recovery strategies form the operational foundation for data protection in software systems. A backup creates a copy of data at a specific point in time, while recovery restores that data when the original becomes corrupted, deleted, or inaccessible. The strategy defines how often backups occur, what data gets backed up, where backups are stored, and how quickly systems can recover.

Organizations lose data through hardware failures, software bugs, human error, malicious attacks, and natural disasters. Without backups, data loss results in business disruption, regulatory penalties, and potential business failure. The cost of data loss often exceeds the cost of maintaining backups by orders of magnitude.

Backup strategies balance multiple objectives: minimizing data loss (measured by Recovery Point Objective or RPO), minimizing downtime (measured by Recovery Time Objective or RTO), managing storage costs, and maintaining backup integrity. Different systems require different strategies based on their criticality, data volume, and change frequency.

A database handling financial transactions might need continuous replication with sub-second RPO and RTO, while archived log files might tolerate daily backups with hours of recovery time. The strategy must account for data size, network bandwidth, backup windows, and regulatory retention requirements.

# Basic backup configuration example
backup_config = {
  source: '/var/app/data',
  destination: 's3://backups/app-data',
  frequency: '0 2 * * *',  # 2 AM daily
  retention: 30,  # days
  compression: true,
  encryption: true
}

Recovery scenarios include complete system restoration after catastrophic failure, point-in-time recovery to undo changes, selective file restoration, and recovery testing to verify backup integrity. Each scenario requires different capabilities from the backup system.

Key Principles

Recovery Point Objective (RPO) defines the maximum acceptable data loss measured in time. An RPO of 1 hour means the system can tolerate losing up to 1 hour of data. RPO determines backup frequency: a 1-hour RPO requires backups at least hourly. Systems with zero RPO need continuous replication or synchronous writes to multiple locations.

Recovery Time Objective (RTO) defines the maximum acceptable downtime. An RTO of 4 hours means the system must be operational within 4 hours of failure. RTO influences backup format, storage location, and recovery automation. Shorter RTOs require faster storage, more automation, and potentially hot standby systems.

Backup types differ in what data they capture and how long they take:

Full backups copy all data regardless of whether it changed. They provide complete system state and simplify recovery but require the most storage and time. A full backup might take 8 hours for a 10TB database.

Incremental backups copy only data that changed since the last backup of any type. They minimize backup time and storage but complicate recovery because restoration requires the last full backup plus all incremental backups since then. Recovering from incremental backups takes longer due to processing multiple backup sets.

Differential backups copy data that changed since the last full backup. They balance storage efficiency and recovery speed: recovery requires only the last full backup plus the most recent differential backup.

The 3-2-1 rule provides a foundational strategy: maintain 3 copies of data (1 primary + 2 backups), store backups on 2 different media types, keep 1 backup copy offsite. This protects against multiple failure scenarios including hardware failure, site disasters, and ransomware attacks.

Backup integrity verification confirms backups are usable. Verification includes checksum validation during backup, restoration testing to confirm data can be recovered, and corruption detection through periodic validation. An untested backup is not a backup.

Retention policies define how long backups are kept. Regulatory requirements often mandate specific retention periods: financial records might need 7 years, healthcare data might need 10 years. Retention balances storage costs against recovery needs and compliance requirements. Many strategies use a tiered approach: daily backups for 30 days, weekly backups for 6 months, monthly backups for 7 years.

# Retention policy implementation
class RetentionPolicy
  def initialize
    @daily_retention = 30
    @weekly_retention = 180
    @monthly_retention = 2555  # ~7 years
  end
  
  def should_keep?(backup)
    age_days = (Date.today - backup.date).to_i
    
    return true if age_days < @daily_retention
    return true if backup.weekly? && age_days < @weekly_retention
    return true if backup.monthly? && age_days < @monthly_retention
    
    false
  end
end

Consistency requirements ensure backups represent valid system state. Application-level consistency captures data at a transactionally consistent point. Crash consistency captures filesystem state as it exists at backup time without application coordination. Database backups often require quiescing writes or using snapshot mechanisms that maintain transaction consistency.

Backup windows define acceptable times for backup operations that might impact performance. A backup window of 2 AM to 6 AM provides 4 hours for backup operations without affecting business hours. Systems with continuous operation require backup methods that don't require downtime, such as continuous replication or snapshot-based backups.

Implementation Approaches

Full backup strategy copies all data on each backup operation. This approach simplifies recovery because restoration requires only a single backup set. Full backups work well for small datasets, systems with infrequent changes, or scenarios where storage costs are minimal.

Implementation creates a complete copy at each interval:

class FullBackupStrategy
  def initialize(source, destination)
    @source = source
    @destination = destination
  end
  
  def backup
    timestamp = Time.now.strftime('%Y%m%d_%H%M%S')
    backup_path = "#{@destination}/full_#{timestamp}"
    
    # Copy entire source to timestamped backup location
    FileUtils.cp_r(@source, backup_path)
    
    create_manifest(backup_path)
    verify_backup(backup_path)
    
    backup_path
  end
  
  def restore(backup_path, restore_location)
    FileUtils.cp_r(backup_path, restore_location)
    verify_restore(restore_location)
  end
end

Incremental backup strategy copies only files that changed since the last backup of any type. After an initial full backup, subsequent backups capture only changes. This minimizes backup time and storage but increases recovery complexity.

Recovery requires the base full backup plus all incremental backups in sequence. A corrupted incremental backup in the chain breaks recovery for all subsequent backups.

class IncrementalBackupStrategy
  def initialize(source, destination, state_file)
    @source = source
    @destination = destination
    @state_file = state_file
    @last_backup_time = load_last_backup_time
  end
  
  def backup
    timestamp = Time.now.to_i
    backup_path = "#{@destination}/incr_#{timestamp}"
    
    changed_files = find_changed_files(@source, @last_backup_time)
    copy_files(changed_files, backup_path)
    
    save_backup_time(timestamp)
    create_manifest(backup_path, changed_files)
    
    backup_path
  end
  
  def find_changed_files(dir, since_time)
    Dir.glob("#{dir}/**/*").select do |file|
      File.file?(file) && File.mtime(file).to_i > since_time
    end
  end
  
  def restore(full_backup, incremental_backups, restore_location)
    # Restore full backup first
    FileUtils.cp_r(full_backup, restore_location)
    
    # Apply incremental backups in order
    incremental_backups.sort.each do |incr_backup|
      apply_incremental(incr_backup, restore_location)
    end
  end
end

Differential backup strategy copies all files that changed since the last full backup. Each differential backup grows larger over time but recovery requires only the full backup plus the most recent differential.

Differential backups balance storage efficiency and recovery simplicity. They take longer than incremental backups but provide faster recovery and more resilience to backup corruption.

Continuous data protection (CDP) captures every change as it occurs, providing the finest possible RPO. CDP systems maintain journals of all writes and can restore to any point in time. This approach requires significant storage and processing but provides maximum data protection.

Database replication, filesystem journaling, and application-level change tracking implement CDP. The system maintains a base snapshot plus a log of all changes, allowing reconstruction of state at any moment.

Snapshot-based backups capture filesystem or storage state at a specific instant without copying all data immediately. Copy-on-write snapshots preserve the original data blocks when changes occur, allowing the snapshot to represent state at snapshot time.

Storage systems like ZFS, Btrfs, and LVM provide snapshot capabilities. Application-consistent snapshots coordinate with applications to flush buffers and pause writes during snapshot creation.

# Snapshot-based backup using LVM
class LVMSnapshotBackup
  def create_snapshot(volume, snapshot_name)
    size = "10G"  # Snapshot storage allocation
    
    system("lvcreate -L #{size} -s -n #{snapshot_name} #{volume}")
    
    mount_point = "/mnt/snapshots/#{snapshot_name}"
    FileUtils.mkdir_p(mount_point)
    system("mount /dev/vg/#{snapshot_name} #{mount_point}")
    
    mount_point
  end
  
  def backup_snapshot(mount_point, destination)
    timestamp = Time.now.strftime('%Y%m%d_%H%M%S')
    archive = "#{destination}/snapshot_#{timestamp}.tar.gz"
    
    system("tar czf #{archive} -C #{mount_point} .")
    
    archive
  end
  
  def cleanup_snapshot(snapshot_name, mount_point)
    system("umount #{mount_point}")
    system("lvremove -f /dev/vg/#{snapshot_name}")
    FileUtils.rm_rf(mount_point)
  end
end

Cloud-based backup strategies store data in cloud object storage like S3, providing durability, geographic distribution, and scalability. Cloud backups offer offsite storage without managing physical infrastructure. Multi-region replication protects against regional failures.

Cloud strategies often use lifecycle policies to transition older backups to cheaper storage tiers automatically. Recent backups stay in hot storage for fast recovery while older backups move to cold storage for cost efficiency.

Replication-based strategies maintain synchronized copies of data across multiple systems. Synchronous replication confirms writes to all replicas before acknowledging the write, providing zero data loss but higher latency. Asynchronous replication acknowledges writes immediately and replicates in the background, reducing latency but allowing potential data loss during failures.

Ruby Implementation

Ruby applications implement backups through filesystem operations, external command execution, database-specific tools, and cloud service APIs. The implementation handles file copying, compression, encryption, and integrity verification.

Basic file backup implementation:

require 'fileutils'
require 'zlib'
require 'digest'
require 'json'

class FileBackup
  def initialize(source_dir, backup_dir)
    @source_dir = source_dir
    @backup_dir = backup_dir
    FileUtils.mkdir_p(@backup_dir)
  end
  
  def create_backup(backup_name = nil)
    backup_name ||= "backup_#{Time.now.strftime('%Y%m%d_%H%M%S')}"
    backup_path = File.join(@backup_dir, backup_name)
    
    # Create compressed archive
    tar_file = "#{backup_path}.tar.gz"
    
    Dir.chdir(File.dirname(@source_dir)) do
      source_name = File.basename(@source_dir)
      system("tar czf #{tar_file} #{source_name}")
    end
    
    # Generate checksum
    checksum = calculate_checksum(tar_file)
    
    # Create metadata
    metadata = {
      backup_name: backup_name,
      timestamp: Time.now.iso8601,
      source: @source_dir,
      size: File.size(tar_file),
      checksum: checksum
    }
    
    File.write("#{tar_file}.json", JSON.pretty_generate(metadata))
    
    metadata
  end
  
  def restore_backup(backup_name, restore_dir)
    tar_file = File.join(@backup_dir, "#{backup_name}.tar.gz")
    metadata_file = "#{tar_file}.json"
    
    # Verify checksum
    unless verify_checksum(tar_file, metadata_file)
      raise "Backup checksum verification failed"
    end
    
    # Extract archive
    FileUtils.mkdir_p(restore_dir)
    system("tar xzf #{tar_file} -C #{restore_dir}")
    
    true
  end
  
  def list_backups
    Dir.glob(File.join(@backup_dir, "*.tar.gz.json")).map do |metadata_file|
      JSON.parse(File.read(metadata_file))
    end.sort_by { |m| m['timestamp'] }.reverse
  end
  
  private
  
  def calculate_checksum(file)
    Digest::SHA256.file(file).hexdigest
  end
  
  def verify_checksum(tar_file, metadata_file)
    metadata = JSON.parse(File.read(metadata_file))
    expected = metadata['checksum']
    actual = calculate_checksum(tar_file)
    expected == actual
  end
end

Database backup implementation uses database-specific tools and Ruby's command execution:

class PostgreSQLBackup
  def initialize(config)
    @host = config[:host]
    @database = config[:database]
    @username = config[:username]
    @backup_dir = config[:backup_dir]
    FileUtils.mkdir_p(@backup_dir)
  end
  
  def create_backup
    timestamp = Time.now.strftime('%Y%m%d_%H%M%S')
    backup_file = File.join(@backup_dir, "#{@database}_#{timestamp}.sql")
    compressed_file = "#{backup_file}.gz"
    
    # Create database dump
    env = { 'PGPASSWORD' => ENV['PGPASSWORD'] }
    cmd = [
      'pg_dump',
      '-h', @host,
      '-U', @username,
      '-F', 'p',  # Plain SQL format
      '-f', backup_file,
      @database
    ]
    
    unless system(env, *cmd)
      raise "Database backup failed"
    end
    
    # Compress backup
    compress_file(backup_file, compressed_file)
    File.delete(backup_file)
    
    # Generate metadata
    metadata = {
      database: @database,
      timestamp: timestamp,
      size: File.size(compressed_file),
      checksum: Digest::SHA256.file(compressed_file).hexdigest
    }
    
    File.write("#{compressed_file}.json", JSON.pretty_generate(metadata))
    
    compressed_file
  end
  
  def restore_backup(backup_file, target_database = nil)
    target_database ||= @database
    
    # Decompress backup
    sql_file = backup_file.sub(/\.gz$/, '')
    decompress_file(backup_file, sql_file)
    
    # Restore database
    env = { 'PGPASSWORD' => ENV['PGPASSWORD'] }
    cmd = [
      'psql',
      '-h', @host,
      '-U', @username,
      '-d', target_database,
      '-f', sql_file
    ]
    
    success = system(env, *cmd)
    File.delete(sql_file)
    
    raise "Database restore failed" unless success
    true
  end
  
  private
  
  def compress_file(input, output)
    Zlib::GzipWriter.open(output) do |gz|
      File.open(input, 'rb') do |file|
        while chunk = file.read(1024 * 1024)
          gz.write(chunk)
        end
      end
    end
  end
  
  def decompress_file(input, output)
    Zlib::GzipReader.open(input) do |gz|
      File.open(output, 'wb') do |file|
        while chunk = gz.read(1024 * 1024)
          file.write(chunk)
        end
      end
    end
  end
end

AWS S3 backup implementation using the AWS SDK:

require 'aws-sdk-s3'

class S3Backup
  def initialize(bucket_name, region = 'us-east-1')
    @bucket_name = bucket_name
    @s3 = Aws::S3::Client.new(region: region)
  end
  
  def upload_backup(local_file, s3_key = nil)
    s3_key ||= File.basename(local_file)
    
    # Upload with multipart for large files
    File.open(local_file, 'rb') do |file|
      @s3.put_object(
        bucket: @bucket_name,
        key: s3_key,
        body: file,
        server_side_encryption: 'AES256',
        storage_class: 'STANDARD_IA'  # Infrequent access
      )
    end
    
    # Verify upload
    head = @s3.head_object(bucket: @bucket_name, key: s3_key)
    
    {
      key: s3_key,
      size: head.content_length,
      etag: head.etag,
      last_modified: head.last_modified
    }
  end
  
  def download_backup(s3_key, local_file)
    FileUtils.mkdir_p(File.dirname(local_file))
    
    File.open(local_file, 'wb') do |file|
      @s3.get_object(
        bucket: @bucket_name,
        key: s3_key
      ) do |chunk|
        file.write(chunk)
      end
    end
    
    local_file
  end
  
  def list_backups(prefix = '')
    response = @s3.list_objects_v2(
      bucket: @bucket_name,
      prefix: prefix
    )
    
    response.contents.map do |obj|
      {
        key: obj.key,
        size: obj.size,
        last_modified: obj.last_modified,
        storage_class: obj.storage_class
      }
    end
  end
  
  def apply_lifecycle_policy
    @s3.put_bucket_lifecycle_configuration(
      bucket: @bucket_name,
      lifecycle_configuration: {
        rules: [
          {
            id: 'archive-old-backups',
            status: 'Enabled',
            transitions: [
              {
                days: 30,
                storage_class: 'GLACIER'
              },
              {
                days: 90,
                storage_class: 'DEEP_ARCHIVE'
              }
            ],
            expiration: {
              days: 2555  # ~7 years
            }
          }
        ]
      }
    )
  end
end

Automated backup scheduler runs backups on defined intervals:

require 'rufus-scheduler'

class BackupScheduler
  def initialize(backup_strategy)
    @backup_strategy = backup_strategy
    @scheduler = Rufus::Scheduler.new
  end
  
  def schedule_daily(time = '02:00')
    @scheduler.cron "0 2 * * *" do
      perform_backup('daily')
    end
  end
  
  def schedule_weekly(day = 'sunday', time = '03:00')
    @scheduler.cron "0 3 * * 0" do
      perform_backup('weekly')
    end
  end
  
  def schedule_monthly(day = 1, time = '04:00')
    @scheduler.cron "0 4 #{day} * *" do
      perform_backup('monthly')
    end
  end
  
  def start
    @scheduler.join
  end
  
  private
  
  def perform_backup(frequency)
    start_time = Time.now
    
    begin
      result = @backup_strategy.create_backup
      
      duration = Time.now - start_time
      
      log_backup_success(frequency, result, duration)
      notify_success(frequency, result)
      
    rescue => e
      log_backup_failure(frequency, e)
      notify_failure(frequency, e)
    end
  end
  
  def log_backup_success(frequency, result, duration)
    puts "[#{Time.now}] #{frequency} backup completed: #{result[:size]} bytes in #{duration}s"
  end
  
  def log_backup_failure(frequency, error)
    puts "[#{Time.now}] #{frequency} backup failed: #{error.message}"
  end
end

Tools & Ecosystem

Backup software provides functionality beyond basic file copying. These tools handle scheduling, retention, compression, encryption, deduplication, and verification.

Restic creates encrypted, deduplicated backups to local storage or cloud providers. Restic stores data in content-addressable format, where identical chunks are stored once regardless of which files contain them. This reduces storage requirements dramatically for systems with repeated data.

class ResticBackup
  def initialize(repository, password)
    @repository = repository
    @password = password
    ENV['RESTIC_PASSWORD'] = password
  end
  
  def backup(paths, tags = [])
    tag_args = tags.flat_map { |tag| ['--tag', tag] }
    
    cmd = [
      'restic',
      '-r', @repository,
      'backup',
      *tag_args,
      *paths
    ]
    
    output = `#{cmd.join(' ')} 2>&1`
    
    unless $?.success?
      raise "Restic backup failed: #{output}"
    end
    
    parse_restic_output(output)
  end
  
  def restore(snapshot_id, target)
    cmd = [
      'restic',
      '-r', @repository,
      'restore',
      snapshot_id,
      '--target', target
    ]
    
    system(*cmd)
  end
  
  def list_snapshots
    output = `restic -r #{@repository} snapshots --json`
    JSON.parse(output)
  end
  
  def forget_old_backups
    cmd = [
      'restic',
      '-r', @repository,
      'forget',
      '--keep-daily', '30',
      '--keep-weekly', '24',
      '--keep-monthly', '84',
      '--prune'
    ]
    
    system(*cmd)
  end
end

Borg Backup provides deduplicated, compressed, and authenticated backups. Borg excels at backing up to remote servers via SSH and provides efficient incremental backups through content-defined chunking.

Ruby gems for backups:

The backup gem provides a DSL for defining backup strategies. It supports multiple storage backends, databases, and notification methods.

# Using the backup gem
require 'backup'

Backup::Model.new(:database_backup, 'Production Database') do
  database PostgreSQL do |db|
    db.name = 'production_db'
    db.username = 'backup_user'
    db.password = ENV['DB_PASSWORD']
    db.host = 'localhost'
  end
  
  store_with S3 do |s3|
    s3.access_key_id = ENV['AWS_ACCESS_KEY']
    s3.secret_access_key = ENV['AWS_SECRET_KEY']
    s3.bucket = 'app-backups'
    s3.region = 'us-east-1'
    s3.path = 'database'
    s3.keep = 30
  end
  
  compress_with Gzip
  
  notify_by Mail do |mail|
    mail.on_success = false
    mail.on_failure = true
    mail.from = 'backups@example.com'
    mail.to = 'ops@example.com'
  end
end

The aws-sdk-s3 gem provides full S3 API access for cloud backups. The google-cloud-storage gem offers similar functionality for Google Cloud Storage.

Database-specific tools:

PostgreSQL uses pg_dump for logical backups and pg_basebackup for physical backups. MySQL uses mysqldump for logical backups and supports binary log replication. MongoDB uses mongodump and mongorestore.

Monitoring and alerting tools track backup success and send notifications on failure. Services like PagerDuty, Datadog, and custom monitoring scripts ensure backup operations complete successfully.

# Backup monitoring with health checks
class BackupMonitor
  def initialize(health_check_url)
    @health_check_url = health_check_url
  end
  
  def ping_start
    HTTP.get("#{@health_check_url}/start")
  end
  
  def ping_success
    HTTP.get(@health_check_url)
  end
  
  def ping_failure(error)
    HTTP.post(@health_check_url, json: {
      status: 'failure',
      error: error.message
    })
  end
end

Security Implications

Encryption at rest protects backup data from unauthorized access when stored. Backups often contain sensitive customer data, credentials, and proprietary information that require protection equal to or greater than production data.

Encrypt backups using strong algorithms like AES-256 before transmitting to storage. The encryption key must be protected separately from the backup data. Key management services like AWS KMS or HashiCorp Vault store encryption keys securely.

require 'openssl'
require 'base64'

class EncryptedBackup
  def initialize(encryption_key)
    @encryption_key = encryption_key
  end
  
  def encrypt_file(input_file, output_file)
    cipher = OpenSSL::Cipher.new('AES-256-CBC')
    cipher.encrypt
    cipher.key = derive_key(@encryption_key)
    
    iv = cipher.random_iv
    
    File.open(output_file, 'wb') do |out|
      # Write IV first (not secret)
      out.write(iv)
      
      File.open(input_file, 'rb') do |input|
        while chunk = input.read(1024 * 1024)
          encrypted = cipher.update(chunk)
          out.write(encrypted)
        end
        out.write(cipher.final)
      end
    end
  end
  
  def decrypt_file(input_file, output_file)
    decipher = OpenSSL::Cipher.new('AES-256-CBC')
    decipher.decrypt
    decipher.key = derive_key(@encryption_key)
    
    File.open(output_file, 'wb') do |out|
      File.open(input_file, 'rb') do |input|
        # Read IV from file
        iv = input.read(16)
        decipher.iv = iv
        
        while chunk = input.read(1024 * 1024)
          decrypted = decipher.update(chunk)
          out.write(decrypted)
        end
        out.write(decipher.final)
      end
    end
  end
  
  private
  
  def derive_key(password)
    OpenSSL::PKCS5.pbkdf2_hmac(
      password,
      'backup-salt',  # Should be unique per backup
      10000,
      32
    )
  end
end

Encryption in transit protects data while transferring to backup storage. Use TLS for network transfers and verify certificates to prevent man-in-the-middle attacks. Cloud storage APIs typically provide TLS by default, but verify configuration.

Access control limits who can create, access, and delete backups. Principle of least privilege applies: backup systems need read access to production data but humans should not access backups without specific authorization.

Implement role-based access control (RBAC) for backup systems. Separate credentials for backup creation, backup restoration, and backup deletion. An attacker compromising production systems should not automatically gain access to backups.

Backup immutability prevents modification or deletion of backups for a specified period. Immutable backups protect against ransomware that tries to encrypt or delete backups before demanding ransom. S3 Object Lock and similar features prevent deletion even with full administrative credentials.

# S3 backup with object lock for immutability
class ImmutableBackup
  def initialize(bucket_name)
    @bucket_name = bucket_name
    @s3 = Aws::S3::Client.new
  end
  
  def enable_object_lock
    @s3.put_object_lock_configuration(
      bucket: @bucket_name,
      object_lock_configuration: {
        object_lock_enabled: 'Enabled',
        rule: {
          default_retention: {
            mode: 'GOVERNANCE',  # or 'COMPLIANCE'
            days: 30
          }
        }
      }
    )
  end
  
  def upload_immutable_backup(file, key)
    File.open(file, 'rb') do |f|
      @s3.put_object(
        bucket: @bucket_name,
        key: key,
        body: f,
        object_lock_mode: 'GOVERNANCE',
        object_lock_retain_until_date: Time.now + (30 * 86400)
      )
    end
  end
end

Credential management requires careful handling. Backup systems need credentials for databases, storage systems, and cloud services. Never hardcode credentials in backup scripts. Use environment variables, secrets management systems, or instance profiles.

Audit logging tracks backup operations for security monitoring and compliance. Log backup creation, restoration attempts, access to backup data, and backup deletions. Include timestamp, user, operation type, and result in audit logs.

Data retention and disposal must comply with regulations. Some regulations require specific retention periods, others mandate secure deletion after retention expires. Implement automated expiration policies and cryptographic erasure (destroying encryption keys) for secure deletion.

Common Pitfalls

Untested backups represent the most critical failure. Organizations discover backup problems during recovery attempts when data loss has already occurred. Regular restore testing verifies backups are valid and recovery procedures work.

Schedule quarterly or monthly restore tests to random servers or test environments. Measure actual RTO during tests and compare against objectives. Document any discrepancies and update procedures.

class BackupValidator
  def validate_backup(backup_file)
    errors = []
    
    # Check file exists
    unless File.exist?(backup_file)
      errors << "Backup file not found"
      return errors
    end
    
    # Check file size
    if File.size(backup_file) == 0
      errors << "Backup file is empty"
    end
    
    # Verify checksum
    metadata_file = "#{backup_file}.json"
    if File.exist?(metadata_file)
      unless verify_integrity(backup_file, metadata_file)
        errors << "Checksum verification failed"
      end
    else
      errors << "Metadata file missing"
    end
    
    # Test restoration to temporary location
    temp_dir = "/tmp/backup_test_#{Time.now.to_i}"
    begin
      restore_backup(backup_file, temp_dir)
      errors << "Restore test failed" unless verify_restore(temp_dir)
    rescue => e
      errors << "Restore exception: #{e.message}"
    ensure
      FileUtils.rm_rf(temp_dir) if Dir.exist?(temp_dir)
    end
    
    errors
  end
end

Insufficient backup frequency leads to excessive data loss. A database with hourly changes backed up daily has up to 24 hours of data loss risk. Measure actual change rates and set backup frequency accordingly.

Backup and source on same storage fails to protect against storage failures. A backup on the same disk as production data provides no protection when that disk fails. The 3-2-1 rule requires geographically separate storage.

Ignoring backup monitoring allows backup failures to continue undetected. Silent failures accumulate until recovery is attempted and no valid backups exist. Implement active monitoring with alerts for backup failures.

Inadequate retention periods delete backups before they might be needed. Corruption discovered weeks after it occurred requires backups older than daily retention. Balance storage costs against recovery scenarios including delayed corruption detection.

Backup corruption during creation produces invalid backups that appear successful. Network interruptions, disk errors, or software bugs corrupt backup data. Verify backups immediately after creation using checksums and test restorations.

Forgetting incremental backup dependencies leads to incomplete recovery. Deleting an incremental backup in a chain breaks recovery for all subsequent backups. Retention policies must consider backup dependencies.

Performance impact on production occurs when backups consume excessive CPU, memory, or I/O. Schedule intensive backup operations during low-traffic periods or use techniques like snapshots that minimize impact.

Incomplete application consistency produces backups with inconsistent state. Backing up a database while writes occur can capture data mid-transaction. Use application-specific backup tools or quiesce writes during backup.

Single point of failure in backup infrastructure eliminates protection when backup servers fail. A single backup server processing all backups becomes a bottleneck and single point of failure. Distribute backup operations and maintain redundancy.

Encryption key loss makes encrypted backups unrecoverable. Organizations implementing encryption without proper key management permanently lose access to backups when keys are lost. Store encryption keys separately from backups with appropriate redundancy.

Reference

Backup Strategy Comparison

Strategy Storage Required Backup Speed Recovery Speed Recovery Complexity
Full Highest Slowest Fastest Simple
Incremental Lowest Fastest Slowest Complex
Differential Medium Medium Medium Moderate
Snapshot Low Very Fast Fast Simple
Continuous High Continuous Very Fast Moderate

RTO and RPO Guidelines

Criticality Level Typical RPO Typical RTO Backup Frequency Storage Type
Critical (Tier 1) Minutes Minutes Continuous/Hourly Replicated
High (Tier 2) Hours Hours Hourly Hot storage
Medium (Tier 3) 24 hours 8 hours Daily Warm storage
Low (Tier 4) Days 24+ hours Weekly Cold storage

Retention Policy Examples

Backup Type Frequency Keep Duration Use Case
Transaction logs Continuous 7 days Point-in-time recovery
Daily full Daily 30 days Recent recovery
Weekly full Weekly 12 weeks Medium-term recovery
Monthly full Monthly 7 years Compliance, long-term
Yearly archive Yearly Indefinite Historical reference

Backup Tools and Gems

Tool/Gem Type Primary Use Key Feature
restic System File backup Deduplication, encryption
borg System File backup Compression, SSH support
pg_dump Database PostgreSQL Logical backup
mysqldump Database MySQL Logical backup
mongodump Database MongoDB BSON export
backup gem Ruby Orchestration Multi-backend DSL
aws-sdk-s3 Ruby Cloud storage S3 integration
rufus-scheduler Ruby Scheduling Cron-like scheduling

Backup Verification Checklist

Check Method Frequency Critical
File integrity Checksum validation Every backup Yes
Backup completion Job status monitoring Every backup Yes
Storage availability Health checks Daily Yes
Restore testing Full restore to test env Monthly Yes
Performance metrics Duration and size tracking Every backup No
Encryption validation Key accessibility test Weekly Yes
Retention compliance Age-based verification Weekly Yes
Documentation review Procedure validation Quarterly No

S3 Storage Classes

Storage Class Retrieval Time Cost Use Case
STANDARD Immediate Highest Recent backups
STANDARD_IA Immediate Medium 30-day backups
GLACIER Minutes-hours Low Long-term backups
DEEP_ARCHIVE 12 hours Lowest Compliance archives

Recovery Process Steps

Phase Actions Validation
Assessment Identify scope of loss, determine recovery point Confirm what needs restoration
Preparation Locate backup, verify integrity, provision resources Checksum validation, space check
Restoration Execute restore procedure, monitor progress Progress monitoring
Verification Compare restored data, test functionality Data validation, application tests
Production Switch to restored system, monitor closely Performance monitoring

Encryption Standards

Algorithm Key Size Use Case Performance
AES-256-CBC 256 bit File encryption Fast
AES-256-GCM 256 bit File encryption with auth Fast
ChaCha20-Poly1305 256 bit Stream encryption Very fast
RSA-4096 4096 bit Key encryption Slow

Backup Command Examples

# PostgreSQL backup
pg_dump -h localhost -U username -F c -f backup.dump database_name

# PostgreSQL restore
pg_restore -h localhost -U username -d database_name backup.dump

# MySQL backup
mysqldump -u username -p database_name > backup.sql

# MySQL restore
mysql -u username -p database_name < backup.sql

# Restic initialize
restic init -r /backup/repo

# Restic backup
restic -r /backup/repo backup /data

# Restic restore
restic -r /backup/repo restore latest --target /restore

# S3 sync
aws s3 sync /local/path s3://bucket/path --sse AES256