CrackedRuby - Object Storage

Overview

Object storage represents a storage architecture that manages data as objects rather than as files in a hierarchy or blocks on disk. Each object contains the data itself, associated metadata, and a unique identifier within a flat address space. This architecture emerged to address the scalability limitations of traditional file systems and block storage when dealing with unstructured data at massive scale.

Unlike file systems that organize data in hierarchical directories, object storage uses a flat namespace where objects are retrieved using unique identifiers, typically through HTTP-based APIs. This design eliminates the performance bottlenecks associated with directory traversals and enables horizontal scaling across distributed systems. Cloud providers like Amazon Web Services (S3), Google Cloud Platform (Cloud Storage), and Microsoft Azure (Blob Storage) popularized this model for internet-scale storage needs.

The fundamental unit in object storage is the object, which consists of three components: the data payload (the actual content), metadata (descriptive information about the object), and the unique identifier (often called a key or object ID). Objects are stored in containers called buckets or containers, which serve as organizational units within the storage system.

# Object storage structure conceptually
object = {
  key: "users/avatar/12345.jpg",
  data: binary_image_data,
  metadata: {
    content_type: "image/jpeg",
    size: 524288,
    created_at: "2025-01-15T10:30:00Z",
    custom_tags: { user_id: "12345", category: "profile" }
  }
}

Object storage systems provide eventual consistency or strong consistency guarantees depending on the implementation. They excel at storing large volumes of unstructured data such as media files, backups, logs, and documents. The HTTP-based API access pattern makes object storage particularly suitable for web applications, content delivery, and cloud-native architectures.

Key Principles

Object storage operates on several fundamental principles that differentiate it from traditional storage systems. The flat namespace eliminates hierarchical structure, storing all objects at the same logical level within a bucket. While keys may appear hierarchical (like "folder/subfolder/file.txt"), these are merely naming conventions—the system treats the entire string as a single identifier without true directory structures.

The immutability principle means that objects cannot be modified in place. Updates require writing a new version of the entire object, which simplifies consistency models and enables versioning features. This write-once-read-many pattern contrasts sharply with file systems that support in-place updates and random access writes.

# Demonstrating immutability - must replace entire object
# Cannot do partial updates like:
# storage.append_to_object("log.txt", new_data)  # Not supported

# Must retrieve, modify, and replace:
existing_content = storage.get_object("logs/app.log")
updated_content = existing_content + new_log_entries
storage.put_object("logs/app.log", updated_content)

Metadata handling forms a core principle of object storage. The system stores metadata separately from the data payload, enabling efficient retrieval of object information without accessing the data itself. Standard metadata includes content type, size, and modification timestamps, while custom metadata allows application-specific attributes. This separation enables fast metadata queries and indexing operations.

# Metadata-only operations
metadata = storage.head_object("documents/report.pdf")
# Returns: { content_type: "application/pdf", size: 2097152, 
#           last_modified: "2025-03-10T14:22:00Z" }
# Data is not transferred, only metadata

Scalability through distribution represents a foundational design principle. Object storage systems distribute data across multiple nodes automatically, using consistent hashing or similar algorithms to determine object placement. This horizontal scaling approach allows systems to grow to exabytes of data by adding nodes rather than upgrading individual servers. The system handles replication, load balancing, and failure recovery transparently.

The API-first access model standardizes interaction through RESTful HTTP interfaces. Objects are addressed via URLs, and operations use standard HTTP verbs (GET, PUT, DELETE, HEAD). This approach decouples storage from specific protocols or operating systems, enabling access from any HTTP client. Authentication and authorization integrate with standard HTTP mechanisms like signed requests and bearer tokens.

Durability through redundancy is achieved by storing multiple copies of each object across different failure domains—separate servers, racks, or even geographic regions. Most cloud object storage services provide eleven 9s (99.999999999%) durability by default, meaning the probability of losing an object in a given year is extremely low. This redundancy operates automatically without requiring application-level configuration.

# Durability is transparent to applications
storage.put_object("backup/database.sql", data, {
  storage_class: "STANDARD"  # Automatically replicated
})
# System replicates across multiple zones/regions internally

Ruby Implementation

Ruby interacts with object storage primarily through SDK libraries provided by cloud vendors or through abstraction libraries that support multiple backends. The AWS SDK for Ruby dominates for S3-compatible storage, while gems like Fog provide multi-cloud abstraction.

The AWS SDK gem provides direct access to Amazon S3 and S3-compatible services:

require 'aws-sdk-s3'

# Initialize client with credentials
client = Aws::S3::Client.new(
  region: 'us-east-1',
  access_key_id: ENV['AWS_ACCESS_KEY_ID'],
  secret_access_key: ENV['AWS_SECRET_ACCESS_KEY']
)

# Create a resource interface (higher-level abstraction)
s3 = Aws::S3::Resource.new(client: client)

# Upload an object
obj = s3.bucket('my-bucket').object('documents/file.pdf')
obj.upload_file('/local/path/file.pdf', {
  content_type: 'application/pdf',
  metadata: { 'author' => 'john-doe', 'version' => '2' }
})

The SDK distinguishes between the client interface (low-level API calls) and resource interface (higher-level abstractions with Ruby objects). The resource interface provides a more idiomatic Ruby experience:

# Resource interface - object-oriented approach
bucket = s3.bucket('data-lake')

# Iterate over objects with prefix
bucket.objects(prefix: 'logs/2025/').each do |obj|
  puts "#{obj.key}: #{obj.size} bytes, modified #{obj.last_modified}"
end

# Check if object exists
if bucket.object('config.json').exists?
  data = bucket.object('config.json').get.body.read
  config = JSON.parse(data)
end

Reading objects returns an IO-like object that can be streamed or read entirely:

# Stream large file without loading entirely into memory
response = obj.get
response.body.each do |chunk|
  # Process chunk (useful for large files)
  process_chunk(chunk)
end

# Or read entirely (small files)
content = obj.get.body.read

# Download to local file
obj.download_file('/local/destination/file.pdf')

Multipart uploads handle large files efficiently by splitting them into parts and uploading in parallel:

# Automatic multipart upload for large files
obj = s3.bucket('uploads').object('large-video.mp4')

# The SDK automatically uses multipart for files > 15MB
obj.upload_file('/local/large-video.mp4', {
  multipart_threshold: 15 * 1024 * 1024  # 15MB
})

# Manual multipart control
multipart = obj.initiate_multipart_upload
parts = []

File.open('/local/large-file.dat', 'rb') do |file|
  part_number = 1
  while (chunk = file.read(5 * 1024 * 1024))  # 5MB chunks
    part = obj.upload_part({
      body: chunk,
      part_number: part_number,
      upload_id: multipart.upload_id
    })
    parts << { etag: part.etag, part_number: part_number }
    part_number += 1
  end
end

obj.complete_multipart_upload({
  upload_id: multipart.upload_id,
  multipart_upload: { parts: parts }
})

Pre-signed URLs enable temporary access without sharing credentials:

# Generate URL valid for 1 hour
presigned_url = obj.presigned_url(:get, expires_in: 3600)
# Share this URL with users for direct download

# Pre-signed POST for client-side uploads
presigned_post = bucket.presigned_post({
  key: 'uploads/${filename}',
  success_action_status: '201',
  acl: 'private',
  content_length_range: 0..10485760  # Max 10MB
})

# Returns form fields and URL for browser-based upload
# presigned_post.url => "https://bucket.s3.amazonaws.com/"
# presigned_post.fields => { key: "...", policy: "...", signature: "..." }

The Fog gem provides a unified interface across multiple cloud providers:

require 'fog/aws'

# AWS S3
storage = Fog::Storage.new({
  provider: 'AWS',
  aws_access_key_id: ENV['AWS_ACCESS_KEY_ID'],
  aws_secret_access_key: ENV['AWS_SECRET_ACCESS_KEY'],
  region: 'us-west-2'
})

# Same code works with Google Cloud Storage
storage = Fog::Storage.new({
  provider: 'Google',
  google_project: 'my-project',
  google_json_key_location: '/path/to/key.json'
})

# Consistent API regardless of provider
directory = storage.directories.get('bucket-name')
file = directory.files.create({
  key: 'document.pdf',
  body: File.open('/local/document.pdf'),
  public: false
})

Active Storage in Rails integrates object storage seamlessly with application models:

# config/storage.yml
amazon:
  service: S3
  access_key_id: <%= ENV['AWS_ACCESS_KEY_ID'] %>
  secret_access_key: <%= ENV['AWS_SECRET_ACCESS_KEY'] %>
  region: us-east-1
  bucket: my-app-uploads

# app/models/user.rb
class User < ApplicationRecord
  has_one_attached :avatar
  has_many_attached :documents
end

# Usage in application code
user.avatar.attach(params[:file])

# Generate URL for display
user.avatar.url
# => Direct URL to object storage

# Download and process
user.avatar.download do |file|
  # Process file contents
end

# Variant processing (images)
user.avatar.variant(resize_to_limit: [200, 200]).processed.url

Design Considerations

Selecting object storage over alternatives requires analyzing access patterns, scalability needs, and cost structures. Object storage excels for write-once-read-many workloads with infrequent updates, but performs poorly for scenarios requiring frequent small updates or random access within files.

Object storage suits applications with the following characteristics: large numbers of unstructured files, horizontal scaling requirements, geographically distributed access, and tolerance for eventual consistency. Media streaming platforms, backup systems, data lakes, and content delivery networks represent ideal use cases. The flat namespace and metadata-driven organization enable efficient scaling to billions of objects without the directory traversal overhead that plagues hierarchical file systems at scale.

Traditional file systems remain preferable for applications requiring POSIX semantics, in-place file modifications, or sub-file random access. Databases, transactional systems, and applications expecting file locking mechanisms should avoid object storage. Block storage better serves scenarios needing low-latency random access or applications designed around block devices.

# Anti-pattern: Frequent small updates
# Object storage forces complete object replacement
1000.times do |i|
  log_data = storage.get_object("app.log")
  log_data += "Log entry #{i}\n"
  storage.put_object("app.log", log_data)  # Rewrites entire file
end
# This pattern generates enormous overhead and costs

# Better: Accumulate locally, write periodically
buffer = []
1000.times do |i|
  buffer << "Log entry #{i}"
  if buffer.size >= 100
    timestamp = Time.now.to_i
    storage.put_object("logs/#{timestamp}.log", buffer.join("\n"))
    buffer.clear
  end
end

Cost structure influences design decisions significantly. Object storage pricing includes storage capacity, request operations (PUT/GET/LIST), and data transfer. Applications making frequent metadata queries or listing operations can incur substantial request costs. Organizing objects with predictable keys and caching metadata reduces these costs.

Storage classes offer different trade-offs between access frequency and cost. Standard storage provides immediate access with higher storage costs. Infrequent access classes reduce storage costs but charge higher retrieval fees. Archive classes (Glacier, Archive Storage) minimize storage costs dramatically but require hours for retrieval. Applications must classify data lifecycle and choose appropriate classes.

# Lifecycle management through storage classes
storage.put_object("documents/recent.pdf", data, {
  storage_class: "STANDARD"
})

storage.put_object("backups/monthly.tar.gz", data, {
  storage_class: "STANDARD_IA"  # Infrequent Access
})

storage.put_object("archives/2020-logs.tar.gz", data, {
  storage_class: "GLACIER"  # Long-term archive
})

# Configure lifecycle policies
bucket.lifecycle_configuration.put({
  rules: [{
    id: "archive-old-logs",
    status: "Enabled",
    prefix: "logs/",
    transitions: [
      { days: 30, storage_class: "STANDARD_IA" },
      { days: 90, storage_class: "GLACIER" }
    ],
    expiration: { days: 365 }
  }]
})

Consistency models affect application design. S3 provides strong consistency for new object writes and overwrites as of December 2020, but other object storage systems may offer only eventual consistency. Applications must handle scenarios where an object exists but is not immediately visible in listings, or where a deleted object temporarily remains accessible.

Data organization strategies impact performance and costs. Prefix-based organization enables parallel processing and efficient filtering, but excessive reliance on listing operations indicates poor design. Hash-based prefixes prevent hot-spotting when object keys have high cardinality.

# Poor: Sequential keys cause hot-spotting
user_files.each do |file|
  key = "uploads/#{Time.now.to_i}_#{file.name}"  # Sequential timestamps
  storage.put_object(key, file.data)
end

# Better: Hash prefix distributes load
require 'digest'

user_files.each do |file|
  hash = Digest::MD5.hexdigest(file.name)[0..3]  # 4-char prefix
  key = "uploads/#{hash}/#{file.name}"  # Distributed across key space
  storage.put_object(key, file.data)
end

Implementation Approaches

Integrating object storage into applications follows several architectural patterns depending on requirements. The direct client approach has applications interact with object storage APIs directly, suitable for simple use cases where the application fully controls storage operations. This pattern minimizes complexity but couples application logic tightly to storage implementation details.

class DocumentService
  def initialize
    @storage = Aws::S3::Resource.new(region: 'us-east-1')
    @bucket = @storage.bucket('documents')
  end

  def store_document(user_id, filename, content)
    key = "users/#{user_id}/documents/#{filename}"
    @bucket.object(key).put(body: content)
    key
  end

  def retrieve_document(key)
    @bucket.object(key).get.body.read
  end
end

The abstraction layer approach introduces an interface that decouples application code from specific storage implementations. This pattern enables swapping storage backends (local filesystem for development, object storage for production) and simplifies testing. The abstraction should expose storage-agnostic operations while hiding provider-specific details.

class StorageAdapter
  def put(key, data, options = {}); raise NotImplementedError; end
  def get(key); raise NotImplementedError; end
  def delete(key); raise NotImplementedError; end
  def exists?(key); raise NotImplementedError; end
end

class S3StorageAdapter < StorageAdapter
  def initialize(bucket_name)
    @s3 = Aws::S3::Resource.new
    @bucket = @s3.bucket(bucket_name)
  end

  def put(key, data, options = {})
    @bucket.object(key).put(body: data, **options)
  end

  def get(key)
    @bucket.object(key).get.body.read
  end

  def delete(key)
    @bucket.object(key).delete
  end

  def exists?(key)
    @bucket.object(key).exists?
  end
end

class LocalStorageAdapter < StorageAdapter
  def initialize(base_path)
    @base_path = base_path
    FileUtils.mkdir_p(@base_path)
  end

  def put(key, data, options = {})
    path = File.join(@base_path, key)
    FileUtils.mkdir_p(File.dirname(path))
    File.write(path, data)
  end

  def get(key)
    File.read(File.join(@base_path, key))
  end

  def delete(key)
    File.delete(File.join(@base_path, key))
  end

  def exists?(key)
    File.exist?(File.join(@base_path, key))
  end
end

# Configuration-based selection
storage = if Rails.env.production?
  S3StorageAdapter.new('production-bucket')
else
  LocalStorageAdapter.new('/tmp/storage')
end

The queue-based processing pattern handles uploads asynchronously, particularly useful for large files or when post-processing is required. Applications accept upload requests immediately, queue processing jobs, and handle the actual storage operations in background workers. This approach improves response times and enables retry logic for failed operations.

# Controller accepts upload, queues processing
class UploadsController < ApplicationController
  def create
    upload_id = SecureRandom.uuid
    
    # Store temporarily
    temp_path = Rails.root.join('tmp', 'uploads', upload_id)
    File.write(temp_path, params[:file].read)
    
    # Queue processing job
    ProcessUploadJob.perform_later(upload_id, current_user.id)
    
    render json: { upload_id: upload_id }, status: :accepted
  end
end

# Background job handles storage
class ProcessUploadJob < ApplicationJob
  queue_as :uploads

  def perform(upload_id, user_id)
    temp_path = Rails.root.join('tmp', 'uploads', upload_id)
    content = File.read(temp_path)
    
    # Store in object storage
    key = "users/#{user_id}/uploads/#{upload_id}"
    storage.put_object(key, content)
    
    # Record in database
    Upload.create!(
      user_id: user_id,
      storage_key: key,
      filename: original_filename
    )
    
    # Clean up temporary file
    File.delete(temp_path)
  end
end

The proxy pattern positions an application server between clients and object storage, managing access control, transformations, or caching. This approach centralizes security policies and enables custom processing but introduces a performance bottleneck and single point of failure. Use proxy patterns when direct client access is unacceptable due to security or processing requirements.

The direct client upload pattern uses pre-signed URLs to enable clients to upload directly to object storage without routing through application servers. This approach maximizes throughput and minimizes server load but requires careful security configuration to prevent abuse.

class DirectUploadController < ApplicationController
  def presign
    s3 = Aws::S3::Resource.new
    bucket = s3.bucket('uploads')
    
    # Generate unique key
    key = "uploads/#{SecureRandom.uuid}/#{params[:filename]}"
    
    # Create presigned POST
    post = bucket.presigned_post({
      key: key,
      success_action_status: '201',
      acl: 'private',
      content_length_range: 1..10.megabytes,
      content_type: params[:content_type],
      expires: 5.minutes.from_now
    })
    
    render json: {
      url: post.url,
      fields: post.fields,
      key: key
    }
  end

  def confirm
    # After client uploads, verify and record
    key = params[:key]
    
    if object_exists?(key)
      Upload.create!(
        user_id: current_user.id,
        storage_key: key
      )
      render json: { status: 'confirmed' }
    else
      render json: { error: 'upload not found' }, status: :not_found
    end
  end
end

Security Implications

Object storage security operates through multiple layers: authentication, authorization, encryption, and network controls. Misconfigured permissions represent the most common security vulnerability, frequently exposing sensitive data publicly.

Access control in object storage follows the principle of least privilege. Most systems support both identity-based policies (who can access what) and resource-based policies (what can be accessed from where). Identity-based policies attach to users or roles, while bucket policies control access at the container level. These policies interact, and the most restrictive permission applies.

# Identity-based policy (IAM role/user)
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": ["s3:GetObject", "s3:PutObject"],
    "Resource": "arn:aws:s3:::my-bucket/uploads/*"
  }]
}

# Resource-based policy (bucket policy)
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Deny",
    "Principal": "*",
    "Action": "s3:*",
    "Resource": "arn:aws:s3:::my-bucket/*",
    "Condition": {
      "Bool": { "aws:SecureTransport": "false" }
    }
  }]
}

Applications should never embed long-term credentials directly in code. Instead, use environment variables, credential files with restricted permissions, or IAM roles when running on cloud infrastructure. Temporary credentials with limited scopes reduce exposure risk.

# Poor: Hard-coded credentials
client = Aws::S3::Client.new(
  access_key_id: 'AKIAIOSFODNN7EXAMPLE',
  secret_access_key: 'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY'
)

# Better: Environment variables
client = Aws::S3::Client.new(
  access_key_id: ENV['AWS_ACCESS_KEY_ID'],
  secret_access_key: ENV['AWS_SECRET_ACCESS_KEY']
)

# Best: IAM role (when on AWS infrastructure)
client = Aws::S3::Client.new
# Credentials automatically sourced from instance metadata

Encryption protects data at rest and in transit. Server-side encryption encrypts objects automatically when written to storage, with keys managed by the service provider (SSE-S3), a key management service (SSE-KMS), or customer-provided keys (SSE-C). Client-side encryption encrypts data before transmission, maintaining full control over keys but increasing complexity.

# Server-side encryption with S3-managed keys
obj.put(
  body: data,
  server_side_encryption: 'AES256'
)

# Server-side encryption with KMS
obj.put(
  body: data,
  server_side_encryption: 'aws:kms',
  ssekms_key_id: 'arn:aws:kms:region:account:key/key-id'
)

# Client-side encryption
require 'openssl'

cipher = OpenSSL::Cipher.new('AES-256-CBC')
cipher.encrypt
key = cipher.random_key
iv = cipher.random_iv

encrypted_data = cipher.update(sensitive_data) + cipher.final

# Store encrypted data and metadata
obj.put(
  body: encrypted_data,
  metadata: {
    'x-amz-iv' => Base64.strict_encode64(iv),
    'x-amz-key-v2' => Base64.strict_encode64(encrypted_key)
  }
)

Pre-signed URLs require careful expiration management. Short expiration times (minutes to hours) minimize exposure if URLs leak. Validation of file types, sizes, and content prevents malicious uploads through pre-signed URLs.

def generate_presigned_url(object_key, expiration = 3600)
  obj = s3.bucket('sensitive-bucket').object(object_key)
  
  # Validate user authorization before generating URL
  raise Unauthorized unless current_user.can_access?(object_key)
  
  obj.presigned_url(:get, {
    expires_in: expiration,
    response_content_disposition: 'attachment',  # Force download
    response_content_type: 'application/octet-stream'
  })
end

# Log URL generation for audit trail
Rails.logger.info("Pre-signed URL generated: user=#{current_user.id} object=#{object_key}")

Public access blocks prevent accidental exposure. Cloud providers offer settings to block all public access at the account or bucket level, overriding individual object permissions. Applications storing sensitive data should enable these blocks by default.

Versioning protects against accidental deletion or overwrite. When enabled, object storage retains all versions of objects, allowing recovery from unintended modifications. This feature incurs additional storage costs but provides valuable protection.

# Enable versioning on bucket
bucket.versioning.enable

# List all versions of an object
bucket.object_versions(prefix: 'document.pdf').each do |version|
  puts "Version: #{version.version_id}, Modified: #{version.last_modified}"
end

# Retrieve specific version
old_version = bucket.object('document.pdf').get(version_id: 'specific-version-id')

# Delete specific version (permanent)
bucket.object('document.pdf').delete(version_id: 'version-to-remove')

Cross-Origin Resource Sharing (CORS) configuration controls browser-based access. Restrictive CORS policies prevent unauthorized domains from accessing objects through JavaScript, protecting against cross-site attacks.

# Configure CORS for bucket
bucket.cors.put({
  cors_configuration: {
    cors_rules: [{
      allowed_origins: ['https://app.example.com'],
      allowed_methods: ['GET', 'PUT'],
      allowed_headers: ['*'],
      max_age_seconds: 3600
    }]
  }
})

Tools & Ecosystem

The Ruby ecosystem provides multiple libraries for interacting with object storage systems. The AWS SDK for Ruby dominates the landscape due to S3's market position and widespread adoption of S3-compatible APIs by alternative providers.

The aws-sdk-s3 gem offers comprehensive S3 functionality with both low-level client and high-level resource interfaces. The client interface maps directly to API operations, while the resource interface provides object-oriented abstractions. Version 3 of the SDK modularized functionality, allowing installation of only required components.

# Gemfile
gem 'aws-sdk-s3', '~> 1.134'

# Or install entire SDK (larger footprint)
gem 'aws-sdk', '~> 3'

# Client-level operations
client = Aws::S3::Client.new
response = client.list_objects_v2(bucket: 'my-bucket', prefix: 'logs/')

# Resource-level operations
s3 = Aws::S3::Resource.new
bucket = s3.bucket('my-bucket')
obj = bucket.object('file.txt')

Fog provides multi-cloud storage abstraction supporting AWS S3, Google Cloud Storage, Azure Blob Storage, OpenStack Swift, and Rackspace Cloud Files. The unified interface simplifies multi-cloud strategies but trades provider-specific features for portability.

# Gemfile
gem 'fog-aws', '~> 3.19'

storage = Fog::Storage.new(
  provider: 'AWS',
  aws_access_key_id: ENV['AWS_ACCESS_KEY_ID'],
  aws_secret_access_key: ENV['AWS_SECRET_ACCESS_KEY']
)

# Consistent API across providers
directory = storage.directories.get('bucket-name')
files = directory.files
file = files.create(key: 'data.json', body: json_data)

CarrierWave provides file upload abstraction for Rails applications, supporting object storage backends through Fog. It handles image processing, validation, and storage backend switching.

# Gemfile
gem 'carrierwave', '~> 3.0'
gem 'fog-aws'

# app/uploaders/avatar_uploader.rb
class AvatarUploader < CarrierWave::Uploader::Base
  storage :fog

  def store_dir
    "uploads/users/#{model.id}/avatars"
  end

  def extension_allowlist
    %w[jpg jpeg png]
  end
end

# config/initializers/carrierwave.rb
CarrierWave.configure do |config|
  config.fog_provider = 'fog/aws'
  config.fog_credentials = {
    provider: 'AWS',
    aws_access_key_id: ENV['AWS_ACCESS_KEY_ID'],
    aws_secret_access_key: ENV['AWS_SECRET_ACCESS_KEY'],
    region: 'us-west-2'
  }
  config.fog_directory = 'my-app-uploads'
end

Shrine offers a modern alternative to CarrierWave with plugin-based architecture and better support for direct uploads. It separates storage concerns from models more cleanly.

# Gemfile
gem 'shrine', '~> 3.5'
gem 'aws-sdk-s3', '~> 1.134'

# config/initializers/shrine.rb
require 'shrine'
require 'shrine/storage/s3'

Shrine.storages = {
  cache: Shrine::Storage::S3.new(prefix: 'cache', **s3_options),
  store: Shrine::Storage::S3.new(**s3_options)
}

Shrine.plugin :activerecord
Shrine.plugin :direct_upload
Shrine.plugin :presign_endpoint

# app/uploaders/document_uploader.rb
class DocumentUploader < Shrine
  plugin :validation_helpers
  
  Attacher.validate do
    validate_max_size 10 * 1024 * 1024  # 10MB
    validate_extension %w[pdf doc docx]
  end
end

MinIO client libraries enable Ruby applications to interact with MinIO, an open-source S3-compatible object storage server. MinIO serves as a self-hosted alternative to cloud object storage.

Command-line tools supplement Ruby libraries for operations and debugging. The AWS CLI provides comprehensive S3 management capabilities. S3cmd offers an alternative CLI focused specifically on S3 operations.

# AWS CLI examples
aws s3 ls s3://bucket-name/prefix/
aws s3 cp local-file.txt s3://bucket-name/remote-file.txt
aws s3 sync ./local-directory s3://bucket-name/remote-prefix/

# S3cmd examples
s3cmd ls s3://bucket-name/
s3cmd put local-file.txt s3://bucket-name/
s3cmd get s3://bucket-name/file.txt local-file.txt

GUI tools like Cyberduck, CloudBerry Explorer, and S3 Browser provide visual interfaces for browsing and managing object storage. These tools assist development and debugging but should not replace programmatic access in production applications.

Reference

Core Object Storage Operations

Operation	S3 API Call	Ruby SDK Method	Description
Create Bucket	CreateBucket	bucket.create	Create new storage container
Delete Bucket	DeleteBucket	bucket.delete	Remove empty storage container
List Buckets	ListBuckets	s3.buckets	Retrieve all buckets in account
Put Object	PutObject	object.put	Upload new object or replace existing
Get Object	GetObject	object.get	Download object content
Delete Object	DeleteObject	object.delete	Remove object from storage
Head Object	HeadObject	object.head	Retrieve object metadata only
Copy Object	CopyObject	object.copy_from	Copy object within or between buckets
List Objects	ListObjectsV2	bucket.objects	List objects with prefix filter

Storage Classes and Use Cases

Storage Class	Access Latency	Retrieval Cost	Use Case
Standard	Milliseconds	None	Frequently accessed data
Standard-IA	Milliseconds	Per-GB fee	Monthly access pattern
Intelligent-Tiering	Milliseconds	Monitoring fee	Unpredictable access patterns
Glacier Instant	Milliseconds	Per-GB fee	Archive with instant access
Glacier Flexible	Minutes to hours	Per-GB fee	Archive with occasional access
Glacier Deep Archive	12 hours	Per-GB fee	Long-term archive rarely accessed

Common Metadata Headers

Header	S3 Equivalent	Purpose	Example Value
Content-Type	Content-Type	MIME type of object	application/json
Content-Length	Content-Length	Object size in bytes	1048576
Content-Encoding	Content-Encoding	Compression method	gzip
Content-Disposition	Content-Disposition	Download behavior	attachment; filename=data.csv
Cache-Control	Cache-Control	Caching directives	max-age=3600
ETag	ETag	Object version identifier	5d41402abc4b2a76b9719d911017c592
Last-Modified	Last-Modified	Modification timestamp	2025-03-15T10:30:00Z

Access Control Mechanisms

Method	Scope	Use Case	Complexity
IAM Policies	Identity-based	Control what users/roles can do	Medium
Bucket Policies	Resource-based	Control bucket-level permissions	Low
ACLs	Object-level	Legacy per-object permissions	High
Pre-signed URLs	Temporary	Time-limited access without credentials	Low
STS Tokens	Temporary credentials	Short-term access with assumed roles	Medium

Multipart Upload Thresholds

File Size	Approach	Part Size	Parallel Parts
Under 5MB	Single PUT	N/A	N/A
5MB - 100MB	Optional multipart	5-10MB	2-5
100MB - 5GB	Recommended multipart	10-100MB	5-10
Over 5GB	Required multipart	100MB+	10+

Consistency Models

Operation	S3 Consistency	Impact
New PUT	Strong	Object immediately available
Overwrite PUT	Strong	New version immediately visible
DELETE	Strong	Object immediately unavailable
List after PUT	Strong	New object appears in listings
List after DELETE	Strong	Deleted object removed from listings

Common Error Codes

Error Code	HTTP Status	Cause	Resolution
NoSuchKey	404	Object does not exist	Verify key is correct
AccessDenied	403	Insufficient permissions	Check IAM policies and bucket policy
NoSuchBucket	404	Bucket does not exist	Verify bucket name and region
InvalidAccessKeyId	403	Invalid credentials	Verify access key ID
SignatureDoesNotMatch	403	Invalid secret key or clock skew	Check secret key and system time
EntityTooLarge	400	Object exceeds size limit	Use multipart upload
TooManyBuckets	400	Account bucket limit reached	Delete unused buckets or request increase
SlowDown	503	Request rate too high	Implement exponential backoff

SDK Configuration Options

Configuration	Environment Variable	Purpose
Access Key ID	AWS_ACCESS_KEY_ID	Authentication credential
Secret Access Key	AWS_SECRET_ACCESS_KEY	Authentication credential
Session Token	AWS_SESSION_TOKEN	Temporary credential component
Region	AWS_REGION	Default service region
Profile	AWS_PROFILE	Named credential profile
Endpoint	AWS_ENDPOINT_URL	Custom endpoint for S3-compatible services
Max Attempts	AWS_MAX_ATTEMPTS	Retry limit for failed requests
Timeout	HTTP_READ_TIMEOUT	Socket read timeout in seconds

Performance Optimization Patterns

Pattern	Benefit	Implementation
Parallel uploads	Higher throughput	Use multipart upload with threads
Hash prefix	Distributed load	Add hash to beginning of keys
Connection pooling	Reduced latency	Reuse HTTP connections
Exponential backoff	Graceful degradation	Retry with increasing delays
Range requests	Partial retrieval	Request specific byte ranges
CloudFront CDN	Lower latency	Cache frequently accessed objects
Transfer Acceleration	Faster uploads	Route through edge locations
Batch operations	Reduced API calls	Combine multiple operations

Object Storage