Overview
Cloud storage types represent distinct architectural approaches to storing and accessing data in distributed systems. Each type optimizes for different access patterns, consistency models, and performance characteristics. The three primary types—object storage, block storage, and file storage—emerged from different computing needs and provide fundamentally different interfaces and guarantees.
Object storage treats data as discrete objects with metadata and unique identifiers, accessed through HTTP APIs. Block storage provides raw storage volumes that operating systems treat as attached disks. File storage exposes hierarchical file systems accessible through network protocols. Each type makes specific trade-offs between performance, scalability, consistency, and cost.
The choice between storage types affects application architecture, data access patterns, performance characteristics, and operational complexity. Object storage excels at storing unstructured data at massive scale with high durability. Block storage provides low-latency access for databases and applications requiring block-level operations. File storage supports shared access patterns where multiple clients need concurrent file system access.
# Object storage access pattern
require 'aws-sdk-s3'
s3 = Aws::S3::Client.new(region: 'us-east-1')
s3.put_object(bucket: 'data-lake', key: 'reports/2024/q1.json', body: data)
# => Stores object with metadata, accessed via HTTP API
# Block storage access pattern (mounted volume)
File.write('/mnt/ebs-volume/database/table.db', binary_data)
# => Direct file system operations on block device
# File storage access pattern
require 'net/sftp'
Net::SFTP.start('file-server.example.com', 'user') do |sftp|
sftp.upload!('/local/path/file.txt', '/shared/documents/file.txt')
end
# => Network file system access
Key Principles
Cloud storage types differ in their fundamental abstraction models, access interfaces, and consistency guarantees. Object storage presents a flat namespace where objects are addressed by unique keys within buckets. The storage system manages object placement, replication, and retrieval without exposing underlying physical layout. Objects are immutable—updates create new versions rather than modifying existing data. This immutability enables aggressive caching, simplified replication, and eventually consistent semantics.
Block storage exposes fixed-size blocks accessed through block-level protocols like iSCSI or NVMe. The storage system presents a linear address space that the operating system partitions into file systems. Block storage provides strong consistency because it targets single-writer scenarios where one instance mounts the volume. The storage layer handles data placement, replication for durability, and snapshot management without client involvement.
File storage provides hierarchical namespaces with directory structures and POSIX semantics. Multiple clients access the same file system concurrently through network protocols like NFS or SMB. The storage system manages locking, cache coherence, and consistency across clients. File storage balances shared access with reasonable performance through distributed locking and metadata caching.
Storage durability differs across types. Object storage achieves extreme durability (11 nines) through erasure coding or cross-region replication. The system automatically maintains redundancy without client action. Block storage provides durability through snapshots and replication within availability zones. File storage durability depends on the underlying replication strategy and may require explicit backup procedures.
Access latency varies significantly. Block storage delivers sub-millisecond latency for local operations because the volume appears as a directly attached disk. File storage introduces network latency plus locking overhead for concurrent access, typically single-digit milliseconds. Object storage has higher latency due to HTTP overhead and distributed architecture, usually tens to hundreds of milliseconds per operation.
Cost models reflect the underlying architecture. Object storage charges per gigabyte stored and per request, with lower storage costs but higher operation costs. Block storage charges for provisioned capacity and IOPS, regardless of actual usage. File storage charges for consumed capacity plus throughput, with higher costs than object storage but lower than high-performance block storage.
Implementation Approaches
Selecting the appropriate storage type requires analyzing data access patterns, consistency requirements, performance needs, and cost constraints. The decision impacts application design, deployment architecture, and operational procedures.
Object storage suits write-once, read-many workloads where data grows indefinitely. Applications that generate logs, store media files, archive historical data, or build data lakes benefit from object storage scalability and durability. The flat namespace simplifies data organization at scale—applications store millions of objects without managing complex directory structures. Object storage integrates with analytics platforms that process data in place without moving it to other systems.
Static website hosting, content distribution, and backup systems align with object storage characteristics. The HTTP API integrates with CDNs for global content delivery. Versioning features enable point-in-time recovery without complex backup schedules. Lifecycle policies automatically transition data between storage classes or delete expired objects.
Block storage targets applications requiring low-latency random access to persistent data. Databases demand consistent performance for concurrent reads and writes across distributed data files. The file system running on block storage provides familiar POSIX semantics while the underlying storage handles replication and snapshots. Applications that need guaranteed IOPS or specific throughput characteristics provision block storage with defined performance parameters.
Boot volumes for virtual machines use block storage because operating systems expect block device semantics. The instance attaches the volume during boot and mounts it as the root file system. Snapshot features enable backup and disaster recovery by capturing volume state at specific points in time.
File storage addresses scenarios where multiple compute instances need simultaneous access to shared data. Content management systems store uploaded files on shared storage so any web server can retrieve them. Machine learning training jobs read datasets from shared file systems while multiple GPU instances process different portions. Development teams share code repositories and build artifacts through network file systems.
High-performance computing workloads may use file storage for shared scratch space where jobs write intermediate results. The distributed file system aggregates performance across multiple storage servers to achieve high throughput. However, consistency overhead limits performance compared to local block storage for single-client access.
Hybrid approaches combine storage types based on data lifecycle and access patterns. Applications write new data to block storage for fast processing, then archive completed work to object storage for long-term retention. Analytics pipelines extract data from object storage, load it into databases on block storage for queries, then write results back to object storage. This tiered architecture balances performance and cost by using the right storage for each workload phase.
# Hybrid storage pattern - transactional data on block, archives on object
class DataProcessor
def initialize
@db_path = '/mnt/block-volume/transactions.db'
@s3 = Aws::S3::Client.new
@archive_bucket = 'transaction-archives'
end
def process_day(date)
# Fast queries against block storage database
transactions = read_from_db(date)
process_transactions(transactions)
# Archive to object storage
archive_key = "transactions/#{date.strftime('%Y/%m/%d')}.json.gz"
@s3.put_object(
bucket: @archive_bucket,
key: archive_key,
body: compress(transactions.to_json),
storage_class: 'GLACIER_IR'
)
end
end
Ruby Implementation
Ruby provides SDKs for major cloud providers that abstract storage type differences behind idiomatic interfaces. The AWS SDK, Google Cloud SDK, and Azure SDK follow similar patterns while exposing provider-specific features.
Object storage in Ruby uses HTTP-based APIs with key-value semantics. The AWS S3 SDK demonstrates typical object storage operations:
require 'aws-sdk-s3'
# Configure client with credentials and region
s3 = Aws::S3::Client.new(
region: 'us-west-2',
credentials: Aws::Credentials.new(
ENV['AWS_ACCESS_KEY_ID'],
ENV['AWS_SECRET_ACCESS_KEY']
)
)
# Upload object with metadata
s3.put_object(
bucket: 'application-data',
key: 'users/12345/profile.json',
body: JSON.generate(user_data),
metadata: {
'user-id' => '12345',
'updated-at' => Time.now.iso8601
},
content_type: 'application/json',
server_side_encryption: 'AES256'
)
# Retrieve object
response = s3.get_object(bucket: 'application-data', key: 'users/12345/profile.json')
profile = JSON.parse(response.body.read)
# => {"name"=>"User", "email"=>"user@example.com"}
# List objects with prefix
s3.list_objects_v2(bucket: 'application-data', prefix: 'users/').each do |response|
response.contents.each do |object|
puts "#{object.key}: #{object.size} bytes"
end
end
Google Cloud Storage follows similar patterns with provider-specific features:
require 'google/cloud/storage'
storage = Google::Cloud::Storage.new(
project_id: 'my-project',
credentials: 'service-account-key.json'
)
bucket = storage.bucket 'application-data'
# Upload with custom metadata
file = bucket.create_file(
'local-file.pdf',
'documents/report.pdf',
metadata: { 'department' => 'engineering' },
cache_control: 'public, max-age=3600'
)
# Generate signed URL for temporary access
url = file.signed_url(method: 'GET', expires: 300)
# => "https://storage.googleapis.com/application-data/documents/report.pdf?..."
Block storage in Ruby applications appears as mounted file systems. Ruby code performs standard file I/O operations without special APIs:
# Block storage mounted at /mnt/data-volume
class DatabaseManager
def initialize(volume_path)
@volume_path = volume_path
@data_dir = File.join(volume_path, 'database')
Dir.mkdir(@data_dir) unless Dir.exist?(@data_dir)
end
def write_transaction(txn_id, data)
file_path = File.join(@data_dir, "#{txn_id}.dat")
File.open(file_path, 'wb') do |f|
f.write(Marshal.dump(data))
f.fsync # Force write to disk
end
end
def read_transaction(txn_id)
file_path = File.join(@data_dir, "#{txn_id}.dat")
Marshal.load(File.binread(file_path))
rescue Errno::ENOENT
nil
end
end
# Volume management through cloud provider SDK
require 'aws-sdk-ec2'
ec2 = Aws::EC2::Client.new(region: 'us-east-1')
# Create block storage volume
volume = ec2.create_volume(
availability_zone: 'us-east-1a',
size: 100, # GB
volume_type: 'gp3',
iops: 3000,
throughput: 125 # MB/s
)
# Attach to instance
ec2.attach_volume(
device: '/dev/sdf',
instance_id: 'i-1234567890abcdef0',
volume_id: volume.volume_id
)
# Create snapshot for backup
snapshot = ec2.create_snapshot(
volume_id: volume.volume_id,
description: "Backup #{Time.now.iso8601}"
)
File storage access in Ruby uses network file system protocols. Applications mount NFS or SMB shares and use standard file operations:
# Assuming EFS mounted at /mnt/efs
class SharedFileManager
def initialize(mount_point)
@mount_point = mount_point
end
def write_shared_file(path, content)
full_path = File.join(@mount_point, path)
FileUtils.mkdir_p(File.dirname(full_path))
File.open(full_path, 'w') do |f|
f.flock(File::LOCK_EX) # Exclusive lock for writing
f.write(content)
end
end
def read_shared_file(path)
full_path = File.join(@mount_point, path)
File.open(full_path, 'r') do |f|
f.flock(File::LOCK_SH) # Shared lock for reading
f.read
end
end
end
# Azure Files SDK for programmatic access
require 'azure/storage/file'
client = Azure::Storage::File::FileService.create(
storage_account_name: ENV['AZURE_STORAGE_ACCOUNT'],
storage_access_key: ENV['AZURE_STORAGE_KEY']
)
# Create share and directory
client.create_share('documents')
client.create_directory('documents', 'reports')
# Upload file
content = File.read('local-report.pdf')
client.create_file('documents', 'reports', 'report.pdf', content.length)
client.put_file_range('documents', 'reports', 'report.pdf', 0, content.length - 1, content)
Multipart uploads handle large files efficiently in object storage:
require 'aws-sdk-s3'
class LargeFileUploader
CHUNK_SIZE = 5 * 1024 * 1024 # 5 MB minimum chunk size
def initialize(s3_client, bucket)
@s3 = s3_client
@bucket = bucket
end
def upload_large_file(file_path, key)
file_size = File.size(file_path)
# Initiate multipart upload
upload = @s3.create_multipart_upload(
bucket: @bucket,
key: key,
server_side_encryption: 'AES256'
)
parts = []
part_number = 1
File.open(file_path, 'rb') do |file|
while chunk = file.read(CHUNK_SIZE)
response = @s3.upload_part(
bucket: @bucket,
key: key,
upload_id: upload.upload_id,
part_number: part_number,
body: chunk
)
parts << { etag: response.etag, part_number: part_number }
part_number += 1
end
end
# Complete upload
@s3.complete_multipart_upload(
bucket: @bucket,
key: key,
upload_id: upload.upload_id,
multipart_upload: { parts: parts }
)
rescue StandardError => e
# Abort on failure to avoid incomplete upload charges
@s3.abort_multipart_upload(
bucket: @bucket,
key: key,
upload_id: upload.upload_id
)
raise
end
end
Security Implications
Storage security encompasses access control, encryption, network isolation, and audit logging. Each storage type presents different attack surfaces and security mechanisms.
Object storage security centers on identity-based access control and encryption. Bucket policies define who can read, write, or delete objects. Applications authenticate using access keys or instance roles that grant specific permissions. Overly permissive bucket policies expose data to unauthorized access—policies should grant minimum required permissions.
# Secure S3 bucket configuration
require 'aws-sdk-s3'
s3 = Aws::S3::Client.new(region: 'us-east-1')
# Enable encryption at rest
s3.put_bucket_encryption(
bucket: 'secure-data',
server_side_encryption_configuration: {
rules: [{
apply_server_side_encryption_by_default: {
sse_algorithm: 'aws:kms',
kms_master_key_id: 'arn:aws:kms:us-east-1:123456789012:key/12345678-1234'
}
}]
}
)
# Block public access
s3.put_public_access_block(
bucket: 'secure-data',
public_access_block_configuration: {
block_public_acls: true,
ignore_public_acls: true,
block_public_policy: true,
restrict_public_buckets: true
}
)
# Require encrypted uploads
policy = {
Version: '2012-10-17',
Statement: [{
Sid: 'DenyUnencryptedObjectUploads',
Effect: 'Deny',
Principal: '*',
Action: 's3:PutObject',
Resource: 'arn:aws:s3:::secure-data/*',
Condition: {
StringNotEquals: {
's3:x-amz-server-side-encryption': 'aws:kms'
}
}
}]
}
s3.put_bucket_policy(bucket: 'secure-data', policy: JSON.generate(policy))
Encryption protects data at rest and in transit. Server-side encryption encrypts objects before writing to disk. The storage service manages encryption keys or integrates with key management services. Client-side encryption gives applications full control over keys but adds complexity.
# Client-side encryption for sensitive data
require 'openssl'
require 'base64'
class EncryptedStorage
def initialize(s3_client, bucket, encryption_key)
@s3 = s3_client
@bucket = bucket
@key = encryption_key
end
def put_encrypted(key, data)
cipher = OpenSSL::Cipher.new('AES-256-CBC')
cipher.encrypt
cipher.key = @key
iv = cipher.random_iv
encrypted = cipher.update(data) + cipher.final
@s3.put_object(
bucket: @bucket,
key: key,
body: encrypted,
metadata: {
'encryption-iv' => Base64.strict_encode64(iv),
'encryption-algorithm' => 'AES-256-CBC'
}
)
end
def get_encrypted(key)
response = @s3.get_object(bucket: @bucket, key: key)
encrypted = response.body.read
iv = Base64.strict_decode64(response.metadata['encryption-iv'])
decipher = OpenSSL::Cipher.new('AES-256-CBC')
decipher.decrypt
decipher.key = @key
decipher.iv = iv
decipher.update(encrypted) + decipher.final
end
end
Block storage security focuses on volume encryption and access control. Encrypted volumes protect data if physical disks are compromised. The encryption key integrates with cloud key management services. Only instances with proper IAM roles can attach and mount encrypted volumes.
Network isolation restricts storage access to specific networks. Virtual private cloud (VPC) endpoints keep traffic within the cloud provider network. Security groups and network ACLs control which instances can access storage services. Private endpoints prevent data exfiltration through public internet.
File storage security requires authentication and authorization for network access. NFSv4 supports Kerberos authentication instead of IP-based access control. SMB shares integrate with Active Directory for user-based permissions. Export policies define which clients can mount file systems and with what permissions.
Access logging tracks storage operations for security monitoring and compliance. Object storage logs record every request with caller identity, timestamp, and operation. Analyzing logs detects anomalous access patterns like unusual geographic locations or excessive deletion operations. Centralized logging systems aggregate storage logs with application logs for correlation.
# Configure S3 access logging
s3.put_bucket_logging(
bucket: 'application-data',
bucket_logging_status: {
logging_enabled: {
target_bucket: 'access-logs',
target_prefix: 's3/application-data/'
}
}
)
# Monitor for suspicious patterns
require 'aws-sdk-s3'
require 'json'
class AccessMonitor
def initialize(log_bucket, log_prefix)
@s3 = Aws::S3::Client.new
@bucket = log_bucket
@prefix = log_prefix
end
def analyze_recent_access(hours: 24)
cutoff = Time.now - (hours * 3600)
suspicious = []
@s3.list_objects_v2(bucket: @bucket, prefix: @prefix).each do |response|
response.contents.each do |object|
next if object.last_modified < cutoff
log = @s3.get_object(bucket: @bucket, key: object.key).body.read
log.each_line do |line|
entry = parse_log_entry(line)
suspicious << entry if suspicious?(entry)
end
end
end
suspicious
end
private
def suspicious?(entry)
entry[:http_status] == '403' || # Unauthorized access attempts
entry[:operation] == 'REST.DELETE.BUCKET' || # Bucket deletion
entry[:bytes_sent] > 1_000_000_000 # Large data transfer
end
end
Performance Considerations
Storage performance varies significantly across types based on latency, throughput, IOPS, and scalability characteristics. Applications must match storage performance to workload requirements.
Object storage throughput scales horizontally by distributing requests across storage servers. Single object uploads have higher latency than block storage operations due to HTTP overhead and distributed architecture. However, parallel uploads achieve high aggregate throughput. Applications uploading many objects concurrently saturate network bandwidth before reaching storage limits.
# Parallel uploads for maximum throughput
require 'concurrent-ruby'
class ParallelUploader
def initialize(s3_client, bucket, thread_pool_size: 10)
@s3 = s3_client
@bucket = bucket
@pool = Concurrent::FixedThreadPool.new(thread_pool_size)
end
def upload_directory(local_path, prefix)
futures = []
Dir.glob("#{local_path}/**/*").each do |file_path|
next if File.directory?(file_path)
relative_path = file_path.sub("#{local_path}/", '')
key = "#{prefix}/#{relative_path}"
future = Concurrent::Future.execute(executor: @pool) do
File.open(file_path, 'rb') do |file|
@s3.put_object(bucket: @bucket, key: key, body: file)
end
{ path: relative_path, success: true }
rescue StandardError => e
{ path: relative_path, success: false, error: e.message }
end
futures << future
end
futures.map(&:value)
end
end
Block storage delivers consistent low-latency performance because volumes attach directly to instances. Provisioned IOPS volumes guarantee specific performance levels regardless of workload patterns. General-purpose SSD volumes provide baseline performance with burst capability for occasional spikes. Throughput-optimized HDD suits sequential workloads but has higher latency for random access.
Performance tuning requires matching volume type to access patterns. Databases with random read-write workloads need provisioned IOPS SSD. Data warehouses with sequential scans perform better with throughput-optimized volumes at lower cost. File systems must align block sizes with expected I/O sizes—larger blocks reduce overhead for sequential access but waste space for small files.
# Create volume with specific performance characteristics
require 'aws-sdk-ec2'
ec2 = Aws::EC2::Client.new
# High IOPS for database
db_volume = ec2.create_volume(
availability_zone: 'us-east-1a',
size: 500,
volume_type: 'io2',
iops: 32000, # Max IOPS for io2
multi_attach_enabled: false
)
# High throughput for analytics
analytics_volume = ec2.create_volume(
availability_zone: 'us-east-1a',
size: 2000,
volume_type: 'st1', # Throughput-optimized HDD
# Delivers 500 MB/s baseline throughput
)
File storage performance depends on distributed file system architecture. The system splits files into chunks stored across multiple servers. Parallel access from multiple clients achieves higher throughput than single-client access. However, locking overhead reduces performance compared to block storage for single-client scenarios.
Caching strategies significantly impact perceived performance. Object storage applications cache frequently accessed objects locally to avoid repeated downloads. Time-to-live (TTL) values balance staleness tolerance with cache hit rates. Content delivery networks cache objects globally, reducing latency for geographically distributed users.
Block storage caching happens at multiple layers. The operating system page cache holds recently accessed blocks in memory. Databases maintain their own caches for query results and frequently accessed pages. Storage systems may cache at the controller level. Each cache layer reduces latency for repeated access but increases complexity for consistency.
Tools & Ecosystem
Ruby gems abstract cloud provider storage APIs and provide higher-level interfaces for common operations. The major cloud SDKs form the foundation of Ruby cloud storage integration.
The AWS SDK for Ruby supports S3 object storage, EBS block storage, and EFS file storage:
# Gemfile
gem 'aws-sdk-s3' # Object storage
gem 'aws-sdk-ec2' # Block storage volumes
gem 'aws-sdk-efs' # File storage
# High-level resource interface
require 'aws-sdk-s3'
s3 = Aws::S3::Resource.new(region: 'us-west-2')
bucket = s3.bucket('my-bucket')
# Simplified operations
bucket.objects(prefix: 'logs/').each do |obj|
puts "#{obj.key}: #{obj.last_modified}"
obj.delete if obj.last_modified < (Time.now - 86400 * 30)
end
Google Cloud Storage Ruby gem provides object storage access:
gem 'google-cloud-storage'
require 'google/cloud/storage'
storage = Google::Cloud::Storage.new
bucket = storage.bucket 'my-bucket'
# Streaming uploads for large files
bucket.create_file '/local/large-file.zip', 'archive.zip' do |file|
file.cache_control = 'private, max-age=0'
end
Azure Storage SDK supports blobs, disks, and files:
gem 'azure-storage-blob'
gem 'azure-storage-file-share'
require 'azure/storage/blob'
client = Azure::Storage::Blob::BlobService.create(
storage_account_name: ENV['AZURE_STORAGE_ACCOUNT'],
storage_access_key: ENV['AZURE_STORAGE_KEY']
)
client.create_block_blob('container', 'blob-name', content)
CarrierWave integrates file uploads with object storage backends:
gem 'carrierwave'
gem 'carrierwave-aws'
class DocumentUploader < CarrierWave::Uploader::Base
storage :aws
def store_dir
"uploads/#{model.class.to_s.underscore}/#{mounted_as}/#{model.id}"
end
def extension_allowlist
%w[pdf doc docx]
end
end
# Configure storage backend
CarrierWave.configure do |config|
config.storage = :aws
config.aws_bucket = 'application-uploads'
config.aws_acl = 'private'
config.aws_credentials = {
access_key_id: ENV['AWS_ACCESS_KEY_ID'],
secret_access_key: ENV['AWS_SECRET_ACCESS_KEY'],
region: 'us-east-1'
}
end
Shrine provides alternative file attachment library with storage abstraction:
gem 'shrine'
gem 'aws-sdk-s3'
require 'shrine'
require 'shrine/storage/s3'
Shrine.storages = {
cache: Shrine::Storage::S3.new(bucket: 'uploads-cache'),
store: Shrine::Storage::S3.new(bucket: 'uploads-store')
}
class DocumentUploader < Shrine
plugin :derivatives
plugin :validation_helpers
Attacher.validate do
validate_max_size 10 * 1024 * 1024 # 10 MB
validate_mime_type %w[application/pdf]
end
end
Fog provides multi-cloud storage abstraction layer:
gem 'fog-aws'
require 'fog/aws'
storage = Fog::Storage.new(
provider: 'AWS',
aws_access_key_id: ENV['AWS_ACCESS_KEY_ID'],
aws_secret_access_key: ENV['AWS_SECRET_ACCESS_KEY']
)
directory = storage.directories.get('my-bucket')
file = directory.files.create(
key: 'data.json',
body: JSON.generate(data),
content_type: 'application/json'
)
Minio client provides S3-compatible storage for on-premises deployments:
gem 'minio'
require 'minio'
client = Minio::Client.new(
endpoint: 'https://minio.example.com',
access_key: 'minioadmin',
secret_key: 'minioadmin',
secure: true
)
client.put_object('my-bucket', 'object-name', file_data)
Real-World Applications
Production systems combine storage types based on data lifecycle, access patterns, and cost optimization. Applications evolve storage architecture as requirements change and data volumes grow.
Web application architectures store user uploads in object storage while maintaining session data on block storage databases. User profile images upload to S3 with CDN distribution for fast global access. Application servers run databases on EBS volumes with automated snapshots for disaster recovery. Logs stream to object storage for long-term retention and analysis.
# Multi-tier web application storage
class WebApplication
def initialize
@s3 = Aws::S3::Client.new
@cdn_domain = 'cdn.example.com'
@db = PostgreSQL.connect('/mnt/data-volume/postgres')
end
def handle_profile_upload(user_id, image_file)
# Upload to S3
key = "profiles/#{user_id}/avatar.jpg"
@s3.put_object(
bucket: 'user-assets',
key: key,
body: image_file,
acl: 'public-read',
cache_control: 'public, max-age=31536000'
)
# Store reference in database
cdn_url = "https://#{@cdn_domain}/#{key}"
@db.exec(
'UPDATE users SET avatar_url = $1 WHERE id = $2',
[cdn_url, user_id]
)
cdn_url
end
def log_event(event_data)
# Buffer logs locally
File.open('/mnt/data-volume/logs/current.log', 'a') do |f|
f.puts JSON.generate(event_data)
end
# Archive to S3 daily
archive_logs if should_archive?
end
private
def archive_logs
timestamp = Time.now.strftime('%Y%m%d')
key = "logs/#{timestamp}.log.gz"
content = File.read('/mnt/data-volume/logs/current.log')
compressed = compress_gzip(content)
@s3.put_object(
bucket: 'application-logs',
key: key,
body: compressed,
storage_class: 'INTELLIGENT_TIERING'
)
File.truncate('/mnt/data-volume/logs/current.log', 0)
end
end
Data analytics platforms stage raw data in object storage, process it with compute clusters reading from block storage, and write results back to object storage. Extract-transform-load (ETL) jobs read source data from S3, load it into databases on EBS for transformation, then export aggregated results to S3 for reporting. This pattern separates durable storage (object) from computation workspace (block).
Machine learning training workflows store training datasets in object storage for cost-effective long-term storage. Training jobs download data to local SSD volumes for fast access during iteration. Checkpoints write to object storage for fault tolerance—if a training instance fails, a new instance resumes from the last checkpoint. Final models publish to object storage for inference service deployment.
Media processing pipelines upload raw video to object storage, trigger processing jobs that download to block storage, perform transcoding on local disk, and upload results to object storage. The processing instance uses high-IOPS SSD for read-write performance during encoding. Output files store in object storage with lifecycle policies that transition to cheaper storage classes after 90 days.
Backup and disaster recovery strategies leverage object storage durability with cross-region replication. Database snapshots copy from block storage to object storage daily. Snapshot retention policies keep recent backups in standard storage and transition old backups to archive storage. Cross-region replication ensures recovery capability if the primary region fails.
# Disaster recovery with cross-region replication
class BackupManager
def initialize
@primary_s3 = Aws::S3::Client.new(region: 'us-east-1')
@replica_s3 = Aws::S3::Client.new(region: 'us-west-2')
@ec2 = Aws::EC2::Client.new(region: 'us-east-1')
end
def backup_database_volume(volume_id)
# Create snapshot
snapshot = @ec2.create_snapshot(
volume_id: volume_id,
description: "Backup #{Time.now.iso8601}"
)
# Wait for snapshot completion
@ec2.wait_until(:snapshot_completed, snapshot_ids: [snapshot.snapshot_id])
# Copy to object storage
snapshot_data = export_snapshot_to_s3(snapshot.snapshot_id)
# Replicate to secondary region
@replica_s3.copy_object(
bucket: 'backups-replica',
copy_source: "backups-primary/#{snapshot_data[:key]}",
key: snapshot_data[:key],
storage_class: 'GLACIER_IR'
)
snapshot.snapshot_id
end
def restore_from_backup(backup_key, target_region: 'us-east-1')
# Download from object storage
response = @replica_s3.get_object(bucket: 'backups-replica', key: backup_key)
# Create volume from backup data
# Implementation depends on backup format
volume = restore_volume_from_data(response.body.read, target_region)
volume.volume_id
end
end
Reference
Storage Type Comparison
| Type | Access Interface | Consistency | Latency | Scalability | Primary Use Cases |
|---|---|---|---|---|---|
| Object | HTTP REST API | Eventual (strong read-after-write) | 50-500ms | Unlimited horizontal | Static content, archives, data lakes |
| Block | Block protocols (iSCSI, NVMe) | Strong | <1ms | Vertical (volume size/IOPS) | Databases, OS volumes, applications |
| File | NFS, SMB, POSIX | Strong with distributed locks | 1-10ms | Horizontal (multiple servers) | Shared access, content management |
AWS Storage Services
| Service | Type | Performance | Durability | Use Cases |
|---|---|---|---|---|
| S3 | Object | Variable by class | 99.999999999% | General object storage |
| S3 Glacier | Object (archive) | Minutes to hours retrieval | 99.999999999% | Long-term archives |
| EBS | Block | Up to 256,000 IOPS | 99.8-99.9% | Database volumes |
| EFS | File | Bursting to 10+ GB/s | 99.999999999% | Shared file systems |
| FSx | File | Protocol-specific | 99.9% | Windows/Lustre file systems |
Google Cloud Storage Services
| Service | Type | Performance | Durability | Use Cases |
|---|---|---|---|---|
| Cloud Storage | Object | Multi-regional | 99.999999999% | Object storage, CDN origin |
| Persistent Disk | Block | Up to 100,000 IOPS | 99.9999% | VM boot/data disks |
| Filestore | File | Up to 16 GB/s | 99.9% | Shared NFS storage |
Azure Storage Services
| Service | Type | Performance | Durability | Use Cases |
|---|---|---|---|---|
| Blob Storage | Object | Tiered performance | 99.999999999% (LRS) | Unstructured data |
| Disk Storage | Block | Up to 160,000 IOPS | 99.999% | VM managed disks |
| Files | File | Up to 100,000 IOPS | 99.9% | SMB file shares |
Object Storage Operations (S3 API)
| Operation | Method | Purpose | Idempotent |
|---|---|---|---|
| put_object | PUT | Upload object | Yes |
| get_object | GET | Download object | Yes |
| delete_object | DELETE | Remove object | Yes |
| list_objects_v2 | GET | List bucket contents | Yes |
| copy_object | PUT | Copy object within/across buckets | Yes |
| head_object | HEAD | Get object metadata | Yes |
Block Storage Volume Types
| Type | Max IOPS | Max Throughput | Latency | Cost Model |
|---|---|---|---|---|
| General Purpose SSD (gp3) | 16,000 | 1,000 MB/s | Single-digit ms | GB + IOPS + throughput |
| Provisioned IOPS SSD (io2) | 256,000 | 4,000 MB/s | Sub-millisecond | GB + IOPS |
| Throughput Optimized HDD (st1) | 500 | 500 MB/s | Low ms | GB only |
| Cold HDD (sc1) | 250 | 250 MB/s | Variable | GB only (lowest) |
Storage Class Selection Criteria
| Criteria | Object Storage | Block Storage | File Storage |
|---|---|---|---|
| Access pattern | Infrequent, distributed | Frequent, local | Shared, concurrent |
| Data mutability | Immutable or versioned | Mutable in place | Mutable in place |
| Consistency needs | Eventual acceptable | Strong required | Strong required |
| Scale | Unlimited | Limited by volume size | Limited by file system |
| Cost priority | Storage cost | Performance cost | Balance both |