CrackedRuby - Cloud Storage Types

Overview

Cloud storage types represent distinct architectural approaches to storing and accessing data in distributed systems. Each type optimizes for different access patterns, consistency models, and performance characteristics. The three primary types—object storage, block storage, and file storage—emerged from different computing needs and provide fundamentally different interfaces and guarantees.

Object storage treats data as discrete objects with metadata and unique identifiers, accessed through HTTP APIs. Block storage provides raw storage volumes that operating systems treat as attached disks. File storage exposes hierarchical file systems accessible through network protocols. Each type makes specific trade-offs between performance, scalability, consistency, and cost.

The choice between storage types affects application architecture, data access patterns, performance characteristics, and operational complexity. Object storage excels at storing unstructured data at massive scale with high durability. Block storage provides low-latency access for databases and applications requiring block-level operations. File storage supports shared access patterns where multiple clients need concurrent file system access.

# Object storage access pattern
require 'aws-sdk-s3'

s3 = Aws::S3::Client.new(region: 'us-east-1')
s3.put_object(bucket: 'data-lake', key: 'reports/2024/q1.json', body: data)
# => Stores object with metadata, accessed via HTTP API

# Block storage access pattern (mounted volume)
File.write('/mnt/ebs-volume/database/table.db', binary_data)
# => Direct file system operations on block device

# File storage access pattern
require 'net/sftp'

Net::SFTP.start('file-server.example.com', 'user') do |sftp|
  sftp.upload!('/local/path/file.txt', '/shared/documents/file.txt')
end
# => Network file system access

Key Principles

Cloud storage types differ in their fundamental abstraction models, access interfaces, and consistency guarantees. Object storage presents a flat namespace where objects are addressed by unique keys within buckets. The storage system manages object placement, replication, and retrieval without exposing underlying physical layout. Objects are immutable—updates create new versions rather than modifying existing data. This immutability enables aggressive caching, simplified replication, and eventually consistent semantics.

Block storage exposes fixed-size blocks accessed through block-level protocols like iSCSI or NVMe. The storage system presents a linear address space that the operating system partitions into file systems. Block storage provides strong consistency because it targets single-writer scenarios where one instance mounts the volume. The storage layer handles data placement, replication for durability, and snapshot management without client involvement.

File storage provides hierarchical namespaces with directory structures and POSIX semantics. Multiple clients access the same file system concurrently through network protocols like NFS or SMB. The storage system manages locking, cache coherence, and consistency across clients. File storage balances shared access with reasonable performance through distributed locking and metadata caching.

Storage durability differs across types. Object storage achieves extreme durability (11 nines) through erasure coding or cross-region replication. The system automatically maintains redundancy without client action. Block storage provides durability through snapshots and replication within availability zones. File storage durability depends on the underlying replication strategy and may require explicit backup procedures.

Access latency varies significantly. Block storage delivers sub-millisecond latency for local operations because the volume appears as a directly attached disk. File storage introduces network latency plus locking overhead for concurrent access, typically single-digit milliseconds. Object storage has higher latency due to HTTP overhead and distributed architecture, usually tens to hundreds of milliseconds per operation.

Cost models reflect the underlying architecture. Object storage charges per gigabyte stored and per request, with lower storage costs but higher operation costs. Block storage charges for provisioned capacity and IOPS, regardless of actual usage. File storage charges for consumed capacity plus throughput, with higher costs than object storage but lower than high-performance block storage.

Implementation Approaches

Selecting the appropriate storage type requires analyzing data access patterns, consistency requirements, performance needs, and cost constraints. The decision impacts application design, deployment architecture, and operational procedures.

Object storage suits write-once, read-many workloads where data grows indefinitely. Applications that generate logs, store media files, archive historical data, or build data lakes benefit from object storage scalability and durability. The flat namespace simplifies data organization at scale—applications store millions of objects without managing complex directory structures. Object storage integrates with analytics platforms that process data in place without moving it to other systems.

Static website hosting, content distribution, and backup systems align with object storage characteristics. The HTTP API integrates with CDNs for global content delivery. Versioning features enable point-in-time recovery without complex backup schedules. Lifecycle policies automatically transition data between storage classes or delete expired objects.

Block storage targets applications requiring low-latency random access to persistent data. Databases demand consistent performance for concurrent reads and writes across distributed data files. The file system running on block storage provides familiar POSIX semantics while the underlying storage handles replication and snapshots. Applications that need guaranteed IOPS or specific throughput characteristics provision block storage with defined performance parameters.

Boot volumes for virtual machines use block storage because operating systems expect block device semantics. The instance attaches the volume during boot and mounts it as the root file system. Snapshot features enable backup and disaster recovery by capturing volume state at specific points in time.

File storage addresses scenarios where multiple compute instances need simultaneous access to shared data. Content management systems store uploaded files on shared storage so any web server can retrieve them. Machine learning training jobs read datasets from shared file systems while multiple GPU instances process different portions. Development teams share code repositories and build artifacts through network file systems.

High-performance computing workloads may use file storage for shared scratch space where jobs write intermediate results. The distributed file system aggregates performance across multiple storage servers to achieve high throughput. However, consistency overhead limits performance compared to local block storage for single-client access.

Hybrid approaches combine storage types based on data lifecycle and access patterns. Applications write new data to block storage for fast processing, then archive completed work to object storage for long-term retention. Analytics pipelines extract data from object storage, load it into databases on block storage for queries, then write results back to object storage. This tiered architecture balances performance and cost by using the right storage for each workload phase.

# Hybrid storage pattern - transactional data on block, archives on object
class DataProcessor
  def initialize
    @db_path = '/mnt/block-volume/transactions.db'
    @s3 = Aws::S3::Client.new
    @archive_bucket = 'transaction-archives'
  end
  
  def process_day(date)
    # Fast queries against block storage database
    transactions = read_from_db(date)
    process_transactions(transactions)
    
    # Archive to object storage
    archive_key = "transactions/#{date.strftime('%Y/%m/%d')}.json.gz"
    @s3.put_object(
      bucket: @archive_bucket,
      key: archive_key,
      body: compress(transactions.to_json),
      storage_class: 'GLACIER_IR'
    )
  end
end

Ruby Implementation

Ruby provides SDKs for major cloud providers that abstract storage type differences behind idiomatic interfaces. The AWS SDK, Google Cloud SDK, and Azure SDK follow similar patterns while exposing provider-specific features.

Object storage in Ruby uses HTTP-based APIs with key-value semantics. The AWS S3 SDK demonstrates typical object storage operations:

require 'aws-sdk-s3'

# Configure client with credentials and region
s3 = Aws::S3::Client.new(
  region: 'us-west-2',
  credentials: Aws::Credentials.new(
    ENV['AWS_ACCESS_KEY_ID'],
    ENV['AWS_SECRET_ACCESS_KEY']
  )
)

# Upload object with metadata
s3.put_object(
  bucket: 'application-data',
  key: 'users/12345/profile.json',
  body: JSON.generate(user_data),
  metadata: {
    'user-id' => '12345',
    'updated-at' => Time.now.iso8601
  },
  content_type: 'application/json',
  server_side_encryption: 'AES256'
)

# Retrieve object
response = s3.get_object(bucket: 'application-data', key: 'users/12345/profile.json')
profile = JSON.parse(response.body.read)
# => {"name"=>"User", "email"=>"user@example.com"}

# List objects with prefix
s3.list_objects_v2(bucket: 'application-data', prefix: 'users/').each do |response|
  response.contents.each do |object|
    puts "#{object.key}: #{object.size} bytes"
  end
end

Google Cloud Storage follows similar patterns with provider-specific features:

require 'google/cloud/storage'

storage = Google::Cloud::Storage.new(
  project_id: 'my-project',
  credentials: 'service-account-key.json'
)

bucket = storage.bucket 'application-data'

# Upload with custom metadata
file = bucket.create_file(
  'local-file.pdf',
  'documents/report.pdf',
  metadata: { 'department' => 'engineering' },
  cache_control: 'public, max-age=3600'
)

# Generate signed URL for temporary access
url = file.signed_url(method: 'GET', expires: 300)
# => "https://storage.googleapis.com/application-data/documents/report.pdf?..."

Block storage in Ruby applications appears as mounted file systems. Ruby code performs standard file I/O operations without special APIs:

# Block storage mounted at /mnt/data-volume
class DatabaseManager
  def initialize(volume_path)
    @volume_path = volume_path
    @data_dir = File.join(volume_path, 'database')
    Dir.mkdir(@data_dir) unless Dir.exist?(@data_dir)
  end
  
  def write_transaction(txn_id, data)
    file_path = File.join(@data_dir, "#{txn_id}.dat")
    File.open(file_path, 'wb') do |f|
      f.write(Marshal.dump(data))
      f.fsync # Force write to disk
    end
  end
  
  def read_transaction(txn_id)
    file_path = File.join(@data_dir, "#{txn_id}.dat")
    Marshal.load(File.binread(file_path))
  rescue Errno::ENOENT
    nil
  end
end

# Volume management through cloud provider SDK
require 'aws-sdk-ec2'

ec2 = Aws::EC2::Client.new(region: 'us-east-1')

# Create block storage volume
volume = ec2.create_volume(
  availability_zone: 'us-east-1a',
  size: 100, # GB
  volume_type: 'gp3',
  iops: 3000,
  throughput: 125 # MB/s
)

# Attach to instance
ec2.attach_volume(
  device: '/dev/sdf',
  instance_id: 'i-1234567890abcdef0',
  volume_id: volume.volume_id
)

# Create snapshot for backup
snapshot = ec2.create_snapshot(
  volume_id: volume.volume_id,
  description: "Backup #{Time.now.iso8601}"
)

File storage access in Ruby uses network file system protocols. Applications mount NFS or SMB shares and use standard file operations:

# Assuming EFS mounted at /mnt/efs
class SharedFileManager
  def initialize(mount_point)
    @mount_point = mount_point
  end
  
  def write_shared_file(path, content)
    full_path = File.join(@mount_point, path)
    FileUtils.mkdir_p(File.dirname(full_path))
    
    File.open(full_path, 'w') do |f|
      f.flock(File::LOCK_EX) # Exclusive lock for writing
      f.write(content)
    end
  end
  
  def read_shared_file(path)
    full_path = File.join(@mount_point, path)
    File.open(full_path, 'r') do |f|
      f.flock(File::LOCK_SH) # Shared lock for reading
      f.read
    end
  end
end

# Azure Files SDK for programmatic access
require 'azure/storage/file'

client = Azure::Storage::File::FileService.create(
  storage_account_name: ENV['AZURE_STORAGE_ACCOUNT'],
  storage_access_key: ENV['AZURE_STORAGE_KEY']
)

# Create share and directory
client.create_share('documents')
client.create_directory('documents', 'reports')

# Upload file
content = File.read('local-report.pdf')
client.create_file('documents', 'reports', 'report.pdf', content.length)
client.put_file_range('documents', 'reports', 'report.pdf', 0, content.length - 1, content)

Multipart uploads handle large files efficiently in object storage:

require 'aws-sdk-s3'

class LargeFileUploader
  CHUNK_SIZE = 5 * 1024 * 1024 # 5 MB minimum chunk size
  
  def initialize(s3_client, bucket)
    @s3 = s3_client
    @bucket = bucket
  end
  
  def upload_large_file(file_path, key)
    file_size = File.size(file_path)
    
    # Initiate multipart upload
    upload = @s3.create_multipart_upload(
      bucket: @bucket,
      key: key,
      server_side_encryption: 'AES256'
    )
    
    parts = []
    part_number = 1
    
    File.open(file_path, 'rb') do |file|
      while chunk = file.read(CHUNK_SIZE)
        response = @s3.upload_part(
          bucket: @bucket,
          key: key,
          upload_id: upload.upload_id,
          part_number: part_number,
          body: chunk
        )
        
        parts << { etag: response.etag, part_number: part_number }
        part_number += 1
      end
    end
    
    # Complete upload
    @s3.complete_multipart_upload(
      bucket: @bucket,
      key: key,
      upload_id: upload.upload_id,
      multipart_upload: { parts: parts }
    )
  rescue StandardError => e
    # Abort on failure to avoid incomplete upload charges
    @s3.abort_multipart_upload(
      bucket: @bucket,
      key: key,
      upload_id: upload.upload_id
    )
    raise
  end
end

Security Implications

Storage security encompasses access control, encryption, network isolation, and audit logging. Each storage type presents different attack surfaces and security mechanisms.

Object storage security centers on identity-based access control and encryption. Bucket policies define who can read, write, or delete objects. Applications authenticate using access keys or instance roles that grant specific permissions. Overly permissive bucket policies expose data to unauthorized access—policies should grant minimum required permissions.

# Secure S3 bucket configuration
require 'aws-sdk-s3'

s3 = Aws::S3::Client.new(region: 'us-east-1')

# Enable encryption at rest
s3.put_bucket_encryption(
  bucket: 'secure-data',
  server_side_encryption_configuration: {
    rules: [{
      apply_server_side_encryption_by_default: {
        sse_algorithm: 'aws:kms',
        kms_master_key_id: 'arn:aws:kms:us-east-1:123456789012:key/12345678-1234'
      }
    }]
  }
)

# Block public access
s3.put_public_access_block(
  bucket: 'secure-data',
  public_access_block_configuration: {
    block_public_acls: true,
    ignore_public_acls: true,
    block_public_policy: true,
    restrict_public_buckets: true
  }
)

# Require encrypted uploads
policy = {
  Version: '2012-10-17',
  Statement: [{
    Sid: 'DenyUnencryptedObjectUploads',
    Effect: 'Deny',
    Principal: '*',
    Action: 's3:PutObject',
    Resource: 'arn:aws:s3:::secure-data/*',
    Condition: {
      StringNotEquals: {
        's3:x-amz-server-side-encryption': 'aws:kms'
      }
    }
  }]
}

s3.put_bucket_policy(bucket: 'secure-data', policy: JSON.generate(policy))

Encryption protects data at rest and in transit. Server-side encryption encrypts objects before writing to disk. The storage service manages encryption keys or integrates with key management services. Client-side encryption gives applications full control over keys but adds complexity.

# Client-side encryption for sensitive data
require 'openssl'
require 'base64'

class EncryptedStorage
  def initialize(s3_client, bucket, encryption_key)
    @s3 = s3_client
    @bucket = bucket
    @key = encryption_key
  end
  
  def put_encrypted(key, data)
    cipher = OpenSSL::Cipher.new('AES-256-CBC')
    cipher.encrypt
    cipher.key = @key
    iv = cipher.random_iv
    
    encrypted = cipher.update(data) + cipher.final
    
    @s3.put_object(
      bucket: @bucket,
      key: key,
      body: encrypted,
      metadata: {
        'encryption-iv' => Base64.strict_encode64(iv),
        'encryption-algorithm' => 'AES-256-CBC'
      }
    )
  end
  
  def get_encrypted(key)
    response = @s3.get_object(bucket: @bucket, key: key)
    encrypted = response.body.read
    iv = Base64.strict_decode64(response.metadata['encryption-iv'])
    
    decipher = OpenSSL::Cipher.new('AES-256-CBC')
    decipher.decrypt
    decipher.key = @key
    decipher.iv = iv
    
    decipher.update(encrypted) + decipher.final
  end
end

Block storage security focuses on volume encryption and access control. Encrypted volumes protect data if physical disks are compromised. The encryption key integrates with cloud key management services. Only instances with proper IAM roles can attach and mount encrypted volumes.

Network isolation restricts storage access to specific networks. Virtual private cloud (VPC) endpoints keep traffic within the cloud provider network. Security groups and network ACLs control which instances can access storage services. Private endpoints prevent data exfiltration through public internet.

File storage security requires authentication and authorization for network access. NFSv4 supports Kerberos authentication instead of IP-based access control. SMB shares integrate with Active Directory for user-based permissions. Export policies define which clients can mount file systems and with what permissions.

Access logging tracks storage operations for security monitoring and compliance. Object storage logs record every request with caller identity, timestamp, and operation. Analyzing logs detects anomalous access patterns like unusual geographic locations or excessive deletion operations. Centralized logging systems aggregate storage logs with application logs for correlation.

# Configure S3 access logging
s3.put_bucket_logging(
  bucket: 'application-data',
  bucket_logging_status: {
    logging_enabled: {
      target_bucket: 'access-logs',
      target_prefix: 's3/application-data/'
    }
  }
)

# Monitor for suspicious patterns
require 'aws-sdk-s3'
require 'json'

class AccessMonitor
  def initialize(log_bucket, log_prefix)
    @s3 = Aws::S3::Client.new
    @bucket = log_bucket
    @prefix = log_prefix
  end
  
  def analyze_recent_access(hours: 24)
    cutoff = Time.now - (hours * 3600)
    suspicious = []
    
    @s3.list_objects_v2(bucket: @bucket, prefix: @prefix).each do |response|
      response.contents.each do |object|
        next if object.last_modified < cutoff
        
        log = @s3.get_object(bucket: @bucket, key: object.key).body.read
        log.each_line do |line|
          entry = parse_log_entry(line)
          suspicious << entry if suspicious?(entry)
        end
      end
    end
    
    suspicious
  end
  
  private
  
  def suspicious?(entry)
    entry[:http_status] == '403' || # Unauthorized access attempts
    entry[:operation] == 'REST.DELETE.BUCKET' || # Bucket deletion
    entry[:bytes_sent] > 1_000_000_000 # Large data transfer
  end
end

Performance Considerations

Storage performance varies significantly across types based on latency, throughput, IOPS, and scalability characteristics. Applications must match storage performance to workload requirements.

Object storage throughput scales horizontally by distributing requests across storage servers. Single object uploads have higher latency than block storage operations due to HTTP overhead and distributed architecture. However, parallel uploads achieve high aggregate throughput. Applications uploading many objects concurrently saturate network bandwidth before reaching storage limits.

# Parallel uploads for maximum throughput
require 'concurrent-ruby'

class ParallelUploader
  def initialize(s3_client, bucket, thread_pool_size: 10)
    @s3 = s3_client
    @bucket = bucket
    @pool = Concurrent::FixedThreadPool.new(thread_pool_size)
  end
  
  def upload_directory(local_path, prefix)
    futures = []
    
    Dir.glob("#{local_path}/**/*").each do |file_path|
      next if File.directory?(file_path)
      
      relative_path = file_path.sub("#{local_path}/", '')
      key = "#{prefix}/#{relative_path}"
      
      future = Concurrent::Future.execute(executor: @pool) do
        File.open(file_path, 'rb') do |file|
          @s3.put_object(bucket: @bucket, key: key, body: file)
        end
        { path: relative_path, success: true }
      rescue StandardError => e
        { path: relative_path, success: false, error: e.message }
      end
      
      futures << future
    end
    
    futures.map(&:value)
  end
end

Block storage delivers consistent low-latency performance because volumes attach directly to instances. Provisioned IOPS volumes guarantee specific performance levels regardless of workload patterns. General-purpose SSD volumes provide baseline performance with burst capability for occasional spikes. Throughput-optimized HDD suits sequential workloads but has higher latency for random access.

Performance tuning requires matching volume type to access patterns. Databases with random read-write workloads need provisioned IOPS SSD. Data warehouses with sequential scans perform better with throughput-optimized volumes at lower cost. File systems must align block sizes with expected I/O sizes—larger blocks reduce overhead for sequential access but waste space for small files.

# Create volume with specific performance characteristics
require 'aws-sdk-ec2'

ec2 = Aws::EC2::Client.new

# High IOPS for database
db_volume = ec2.create_volume(
  availability_zone: 'us-east-1a',
  size: 500,
  volume_type: 'io2',
  iops: 32000, # Max IOPS for io2
  multi_attach_enabled: false
)

# High throughput for analytics
analytics_volume = ec2.create_volume(
  availability_zone: 'us-east-1a',
  size: 2000,
  volume_type: 'st1', # Throughput-optimized HDD
  # Delivers 500 MB/s baseline throughput
)

File storage performance depends on distributed file system architecture. The system splits files into chunks stored across multiple servers. Parallel access from multiple clients achieves higher throughput than single-client access. However, locking overhead reduces performance compared to block storage for single-client scenarios.

Caching strategies significantly impact perceived performance. Object storage applications cache frequently accessed objects locally to avoid repeated downloads. Time-to-live (TTL) values balance staleness tolerance with cache hit rates. Content delivery networks cache objects globally, reducing latency for geographically distributed users.

Block storage caching happens at multiple layers. The operating system page cache holds recently accessed blocks in memory. Databases maintain their own caches for query results and frequently accessed pages. Storage systems may cache at the controller level. Each cache layer reduces latency for repeated access but increases complexity for consistency.

Tools & Ecosystem

Ruby gems abstract cloud provider storage APIs and provide higher-level interfaces for common operations. The major cloud SDKs form the foundation of Ruby cloud storage integration.

The AWS SDK for Ruby supports S3 object storage, EBS block storage, and EFS file storage:

# Gemfile
gem 'aws-sdk-s3'      # Object storage
gem 'aws-sdk-ec2'     # Block storage volumes
gem 'aws-sdk-efs'     # File storage

# High-level resource interface
require 'aws-sdk-s3'

s3 = Aws::S3::Resource.new(region: 'us-west-2')
bucket = s3.bucket('my-bucket')

# Simplified operations
bucket.objects(prefix: 'logs/').each do |obj|
  puts "#{obj.key}: #{obj.last_modified}"
  obj.delete if obj.last_modified < (Time.now - 86400 * 30)
end

Google Cloud Storage Ruby gem provides object storage access:

gem 'google-cloud-storage'

require 'google/cloud/storage'

storage = Google::Cloud::Storage.new
bucket = storage.bucket 'my-bucket'

# Streaming uploads for large files
bucket.create_file '/local/large-file.zip', 'archive.zip' do |file|
  file.cache_control = 'private, max-age=0'
end

Azure Storage SDK supports blobs, disks, and files:

gem 'azure-storage-blob'
gem 'azure-storage-file-share'

require 'azure/storage/blob'

client = Azure::Storage::Blob::BlobService.create(
  storage_account_name: ENV['AZURE_STORAGE_ACCOUNT'],
  storage_access_key: ENV['AZURE_STORAGE_KEY']
)

client.create_block_blob('container', 'blob-name', content)

CarrierWave integrates file uploads with object storage backends:

gem 'carrierwave'
gem 'carrierwave-aws'

class DocumentUploader < CarrierWave::Uploader::Base
  storage :aws
  
  def store_dir
    "uploads/#{model.class.to_s.underscore}/#{mounted_as}/#{model.id}"
  end
  
  def extension_allowlist
    %w[pdf doc docx]
  end
end

# Configure storage backend
CarrierWave.configure do |config|
  config.storage = :aws
  config.aws_bucket = 'application-uploads'
  config.aws_acl = 'private'
  config.aws_credentials = {
    access_key_id: ENV['AWS_ACCESS_KEY_ID'],
    secret_access_key: ENV['AWS_SECRET_ACCESS_KEY'],
    region: 'us-east-1'
  }
end

Shrine provides alternative file attachment library with storage abstraction:

gem 'shrine'
gem 'aws-sdk-s3'

require 'shrine'
require 'shrine/storage/s3'

Shrine.storages = {
  cache: Shrine::Storage::S3.new(bucket: 'uploads-cache'),
  store: Shrine::Storage::S3.new(bucket: 'uploads-store')
}

class DocumentUploader < Shrine
  plugin :derivatives
  plugin :validation_helpers
  
  Attacher.validate do
    validate_max_size 10 * 1024 * 1024 # 10 MB
    validate_mime_type %w[application/pdf]
  end
end

Fog provides multi-cloud storage abstraction layer:

gem 'fog-aws'

require 'fog/aws'

storage = Fog::Storage.new(
  provider: 'AWS',
  aws_access_key_id: ENV['AWS_ACCESS_KEY_ID'],
  aws_secret_access_key: ENV['AWS_SECRET_ACCESS_KEY']
)

directory = storage.directories.get('my-bucket')
file = directory.files.create(
  key: 'data.json',
  body: JSON.generate(data),
  content_type: 'application/json'
)

Minio client provides S3-compatible storage for on-premises deployments:

gem 'minio'

require 'minio'

client = Minio::Client.new(
  endpoint: 'https://minio.example.com',
  access_key: 'minioadmin',
  secret_key: 'minioadmin',
  secure: true
)

client.put_object('my-bucket', 'object-name', file_data)

Real-World Applications

Production systems combine storage types based on data lifecycle, access patterns, and cost optimization. Applications evolve storage architecture as requirements change and data volumes grow.

Web application architectures store user uploads in object storage while maintaining session data on block storage databases. User profile images upload to S3 with CDN distribution for fast global access. Application servers run databases on EBS volumes with automated snapshots for disaster recovery. Logs stream to object storage for long-term retention and analysis.

# Multi-tier web application storage
class WebApplication
  def initialize
    @s3 = Aws::S3::Client.new
    @cdn_domain = 'cdn.example.com'
    @db = PostgreSQL.connect('/mnt/data-volume/postgres')
  end
  
  def handle_profile_upload(user_id, image_file)
    # Upload to S3
    key = "profiles/#{user_id}/avatar.jpg"
    @s3.put_object(
      bucket: 'user-assets',
      key: key,
      body: image_file,
      acl: 'public-read',
      cache_control: 'public, max-age=31536000'
    )
    
    # Store reference in database
    cdn_url = "https://#{@cdn_domain}/#{key}"
    @db.exec(
      'UPDATE users SET avatar_url = $1 WHERE id = $2',
      [cdn_url, user_id]
    )
    
    cdn_url
  end
  
  def log_event(event_data)
    # Buffer logs locally
    File.open('/mnt/data-volume/logs/current.log', 'a') do |f|
      f.puts JSON.generate(event_data)
    end
    
    # Archive to S3 daily
    archive_logs if should_archive?
  end
  
  private
  
  def archive_logs
    timestamp = Time.now.strftime('%Y%m%d')
    key = "logs/#{timestamp}.log.gz"
    
    content = File.read('/mnt/data-volume/logs/current.log')
    compressed = compress_gzip(content)
    
    @s3.put_object(
      bucket: 'application-logs',
      key: key,
      body: compressed,
      storage_class: 'INTELLIGENT_TIERING'
    )
    
    File.truncate('/mnt/data-volume/logs/current.log', 0)
  end
end

Data analytics platforms stage raw data in object storage, process it with compute clusters reading from block storage, and write results back to object storage. Extract-transform-load (ETL) jobs read source data from S3, load it into databases on EBS for transformation, then export aggregated results to S3 for reporting. This pattern separates durable storage (object) from computation workspace (block).

Machine learning training workflows store training datasets in object storage for cost-effective long-term storage. Training jobs download data to local SSD volumes for fast access during iteration. Checkpoints write to object storage for fault tolerance—if a training instance fails, a new instance resumes from the last checkpoint. Final models publish to object storage for inference service deployment.

Media processing pipelines upload raw video to object storage, trigger processing jobs that download to block storage, perform transcoding on local disk, and upload results to object storage. The processing instance uses high-IOPS SSD for read-write performance during encoding. Output files store in object storage with lifecycle policies that transition to cheaper storage classes after 90 days.

Backup and disaster recovery strategies leverage object storage durability with cross-region replication. Database snapshots copy from block storage to object storage daily. Snapshot retention policies keep recent backups in standard storage and transition old backups to archive storage. Cross-region replication ensures recovery capability if the primary region fails.

# Disaster recovery with cross-region replication
class BackupManager
  def initialize
    @primary_s3 = Aws::S3::Client.new(region: 'us-east-1')
    @replica_s3 = Aws::S3::Client.new(region: 'us-west-2')
    @ec2 = Aws::EC2::Client.new(region: 'us-east-1')
  end
  
  def backup_database_volume(volume_id)
    # Create snapshot
    snapshot = @ec2.create_snapshot(
      volume_id: volume_id,
      description: "Backup #{Time.now.iso8601}"
    )
    
    # Wait for snapshot completion
    @ec2.wait_until(:snapshot_completed, snapshot_ids: [snapshot.snapshot_id])
    
    # Copy to object storage
    snapshot_data = export_snapshot_to_s3(snapshot.snapshot_id)
    
    # Replicate to secondary region
    @replica_s3.copy_object(
      bucket: 'backups-replica',
      copy_source: "backups-primary/#{snapshot_data[:key]}",
      key: snapshot_data[:key],
      storage_class: 'GLACIER_IR'
    )
    
    snapshot.snapshot_id
  end
  
  def restore_from_backup(backup_key, target_region: 'us-east-1')
    # Download from object storage
    response = @replica_s3.get_object(bucket: 'backups-replica', key: backup_key)
    
    # Create volume from backup data
    # Implementation depends on backup format
    volume = restore_volume_from_data(response.body.read, target_region)
    
    volume.volume_id
  end
end

Reference

Storage Type Comparison

Type	Access Interface	Consistency	Latency	Scalability	Primary Use Cases
Object	HTTP REST API	Eventual (strong read-after-write)	50-500ms	Unlimited horizontal	Static content, archives, data lakes
Block	Block protocols (iSCSI, NVMe)	Strong	<1ms	Vertical (volume size/IOPS)	Databases, OS volumes, applications
File	NFS, SMB, POSIX	Strong with distributed locks	1-10ms	Horizontal (multiple servers)	Shared access, content management

AWS Storage Services

Service	Type	Performance	Durability	Use Cases
S3	Object	Variable by class	99.999999999%	General object storage
S3 Glacier	Object (archive)	Minutes to hours retrieval	99.999999999%	Long-term archives
EBS	Block	Up to 256,000 IOPS	99.8-99.9%	Database volumes
EFS	File	Bursting to 10+ GB/s	99.999999999%	Shared file systems
FSx	File	Protocol-specific	99.9%	Windows/Lustre file systems

Google Cloud Storage Services

Service	Type	Performance	Durability	Use Cases
Cloud Storage	Object	Multi-regional	99.999999999%	Object storage, CDN origin
Persistent Disk	Block	Up to 100,000 IOPS	99.9999%	VM boot/data disks
Filestore	File	Up to 16 GB/s	99.9%	Shared NFS storage

Azure Storage Services

Service	Type	Performance	Durability	Use Cases
Blob Storage	Object	Tiered performance	99.999999999% (LRS)	Unstructured data
Disk Storage	Block	Up to 160,000 IOPS	99.999%	VM managed disks
Files	File	Up to 100,000 IOPS	99.9%	SMB file shares

Object Storage Operations (S3 API)

Operation	Method	Purpose	Idempotent
put_object	PUT	Upload object	Yes
get_object	GET	Download object	Yes
delete_object	DELETE	Remove object	Yes
list_objects_v2	GET	List bucket contents	Yes
copy_object	PUT	Copy object within/across buckets	Yes
head_object	HEAD	Get object metadata	Yes

Block Storage Volume Types

Type	Max IOPS	Max Throughput	Latency	Cost Model
General Purpose SSD (gp3)	16,000	1,000 MB/s	Single-digit ms	GB + IOPS + throughput
Provisioned IOPS SSD (io2)	256,000	4,000 MB/s	Sub-millisecond	GB + IOPS
Throughput Optimized HDD (st1)	500	500 MB/s	Low ms	GB only
Cold HDD (sc1)	250	250 MB/s	Variable	GB only (lowest)

Storage Class Selection Criteria

Criteria	Object Storage	Block Storage	File Storage
Access pattern	Infrequent, distributed	Frequent, local	Shared, concurrent
Data mutability	Immutable or versioned	Mutable in place	Mutable in place
Consistency needs	Eventual acceptable	Strong required	Strong required
Scale	Unlimited	Limited by volume size	Limited by file system
Cost priority	Storage cost	Performance cost	Balance both

Cloud Storage Types