Overview
Container storage addresses the fundamental challenge of data persistence in ephemeral container environments. Containers operate as isolated, stateless units by default—when a container stops, all data written to its writable layer disappears. This design aligns with microservices principles but creates problems for applications requiring data persistence: databases, file uploads, logs, application state, and cached data.
Container storage solutions provide mechanisms to preserve data beyond container lifecycles. Three primary approaches exist: volumes managed by the container runtime, bind mounts that map host filesystem paths into containers, and temporary filesystems stored in memory. Each serves distinct use cases with different performance characteristics, security implications, and operational complexity.
The container storage landscape evolved from Docker's initial volume implementation to Kubernetes' sophisticated persistent volume subsystem. Modern container platforms abstract storage provisioning through plugins and drivers, enabling integration with network-attached storage, cloud provider storage services, and distributed filesystems. This abstraction layer separates storage lifecycle from container lifecycle, allowing data to persist across container restarts, redeployments, and cluster migrations.
Ruby applications running in containers face identical storage challenges as applications in other languages: session data persistence, file upload storage, database file management, and log retention. The stateless nature of Ruby web applications (Rails, Sinatra, Hanami) aligns well with container design, but stateful components—Active Storage uploads, Action Cable message persistence, background job queues—require explicit storage strategies.
Key Principles
Container Filesystem Layers: Containers use layered filesystems where image layers stack as read-only, with a thin writable layer added at runtime. Writing to this writable layer creates copy-on-write overhead and data disappears when the container stops. Container storage mechanisms bypass this layered filesystem for performance and persistence.
Storage Lifecycle Independence: Container storage separates data lifecycle from container lifecycle. A volume or persistent volume exists independently—created before containers use it, persisting after containers terminate. This independence enables data sharing between containers, rolling updates without data loss, and backup/restore operations.
Mount Namespaces and Isolation: Containers use mount namespaces to create isolated filesystem views. Storage mounts appear as regular filesystem paths inside containers but map to external storage locations. This abstraction hides the underlying storage implementation from containerized applications—a container accessing /data might read from network storage, local SSD, or cloud object storage.
Volume Drivers and Plugins: Container platforms use driver architecture for storage provisioning. Volume drivers handle storage backend integration—local filesystem, NFS, cloud provider APIs, distributed storage systems. This plugin model enables storage system integration without modifying the container runtime.
Access Modes and Concurrency: Storage systems define access modes controlling concurrent access patterns. ReadWriteOnce limits mounting to a single node, ReadOnlyMany allows multiple readers, ReadWriteMany permits concurrent read/write access. These modes reflect underlying storage system capabilities—block storage typically supports single-writer scenarios while network filesystems handle concurrent access.
Storage Classes and Dynamic Provisioning: Modern container orchestrators introduce storage classes that define storage provisioning parameters. A storage class specifies the volume driver, performance characteristics (IOPS, throughput), replication settings, and backup policies. Dynamic provisioning creates volumes on-demand based on storage class definitions, eliminating manual volume creation.
Data Locality and Performance: Storage location relative to compute resources affects performance significantly. Local volumes provide lowest latency but lack portability. Network-attached storage introduces latency but enables data sharing and migration. The locality/portability tradeoff shapes storage architecture decisions.
Ownership and Permissions: Container storage must handle user ID mapping between containers and host systems. Containers often run as non-root users for security, but volume data may be owned by different UIDs. Storage drivers and security contexts manage permission mapping to prevent access violations.
Implementation Approaches
Volume-Based Storage: Volumes represent the primary storage abstraction in container environments. The container runtime manages volume lifecycle, creation, and cleanup. Volumes exist as directories on the host system (for local volumes) or as connections to external storage systems (for plugin-managed volumes). Creating a volume establishes a named storage resource that containers reference by name rather than path.
Docker volumes operate through the volume API, with the Docker daemon managing volume storage in /var/lib/docker/volumes by default. When a container mounts a volume, Docker ensures the volume exists, creates it if necessary, and mounts it at the specified container path. Multiple containers can mount the same volume simultaneously if the underlying storage supports concurrent access.
Kubernetes persistent volumes implement a more sophisticated model with separation between cluster-wide storage resources (PersistentVolumes) and namespace-scoped storage requests (PersistentVolumeClaims). Administrators define PersistentVolumes representing available storage capacity. Developers create PersistentVolumeClaims requesting storage with specific characteristics. The Kubernetes control plane binds claims to volumes matching the requirements.
Bind Mount Strategy: Bind mounts map host filesystem paths directly into containers. This approach provides direct access to host data without abstraction layers. Bind mounts suit development environments where source code on the host mounts into containers for live reloading, and production scenarios requiring access to specific host paths like Unix sockets or device files.
The bind mount mechanism operates through the container runtime's mount namespace manipulation. When starting a container with a bind mount, the runtime creates a mount point in the container's filesystem namespace pointing to the host path. Changes made through either the host path or container path appear immediately in both views—they reference identical underlying storage.
Bind mounts introduce security considerations since they expose host filesystem paths to containers. A compromised container with write access to a bind-mounted host directory can modify host system files. This risk drives recommendations to use bind mounts sparingly in production, preferring volumes with controlled scope.
Temporary Filesystem Pattern: Tmpfs mounts create in-memory filesystems within containers, storing data in RAM rather than persistent storage. This approach provides maximum performance for temporary data, sensitive information that shouldn't persist, or applications requiring fast random access to working data. Tmpfs storage disappears when the container stops, making it unsuitable for persistent data but ideal for caches, temporary files, and runtime state.
Container runtimes implement tmpfs mounts using the host kernel's tmpfs filesystem type. The mount consumes host memory, so tmpfs sizing must account for total container memory limits. Large tmpfs mounts can cause memory pressure, triggering container eviction in orchestrated environments.
Persistent Volume Claim Workflow: Kubernetes' dynamic provisioning implements an automated storage workflow. Developers create PersistentVolumeClaims specifying storage requirements—size, access mode, storage class. The PersistentVolume controller watches for unbound claims and triggers volume provisioning through the specified storage class provisioner. The provisioner contacts the storage backend (cloud API, storage array, distributed system), creates the volume, and generates a PersistentVolume resource. The controller binds the claim to the volume, making it available for pod mounting.
This workflow abstracts storage complexity from application developers. A claim for "100GB of SSD storage with ReadWriteOnce access" triggers appropriate provisioning without developer knowledge of storage backend details, API credentials, or provisioning procedures.
Ruby Implementation
Ruby applications interact with container storage through standard file I/O operations—the storage mount mechanism remains transparent to application code. However, Ruby applications must handle storage-related concerns: file permissions, concurrent access, error handling for storage failures, and configuration of storage paths.
Rails applications commonly use Active Storage for file uploads. Configuring Active Storage in containerized environments requires selecting appropriate storage services and configuring volume mounts for disk-based storage:
# config/storage.yml
local:
service: Disk
root: <%= ENV['STORAGE_PATH'] || Rails.root.join('storage') %>
production:
service: Disk
root: /app/storage # Mounted volume path
The container configuration mounts a persistent volume at /app/storage, ensuring uploaded files persist across container restarts:
# docker-compose.yml
services:
web:
image: myapp:latest
volumes:
- upload-storage:/app/storage
environment:
STORAGE_PATH: /app/storage
volumes:
upload-storage:
driver: local
Ruby applications writing logs to container storage should configure log paths to mounted volumes. The standard Logger class writes to specified file paths:
# config/environments/production.rb
config.logger = Logger.new('/app/logs/production.log')
Corresponding volume mount ensures log persistence:
volumes:
- log-storage:/app/logs
Background job processing with Sidekiq requires persistent Redis data when running Redis in containers. The Redis container mounts a volume for data directory persistence:
# Docker Compose configuration for Rails + Redis
services:
redis:
image: redis:7
volumes:
- redis-data:/data
command: redis-server --appendonly yes
sidekiq:
image: myapp:latest
command: bundle exec sidekiq
depends_on:
- redis
environment:
REDIS_URL: redis://redis:6379/0
volumes:
redis-data:
driver: local
Handling storage errors in Ruby requires catching I/O exceptions and implementing appropriate fallback behavior:
class DocumentStorage
def save_document(content, filename)
path = File.join(storage_path, filename)
File.write(path, content)
rescue Errno::ENOSPC
# Volume full - implement cleanup or alerting
logger.error "Storage volume full: #{storage_path}"
raise StorageError, "Insufficient storage space"
rescue Errno::EACCES
# Permission denied - check volume mount permissions
logger.error "Permission denied writing to #{path}"
raise StorageError, "Storage permission error"
rescue SystemCallError => e
# Other filesystem errors
logger.error "Storage error: #{e.message}"
raise StorageError, "Storage operation failed"
end
private
def storage_path
ENV.fetch('STORAGE_PATH', '/app/storage')
end
end
Ruby applications using SQLite databases in containers must mount database file locations as volumes. SQLite's single-file database aligns well with container storage patterns:
# config/database.yml
production:
adapter: sqlite3
database: /app/data/production.sqlite3
pool: 5
timeout: 5000
Container configuration with database volume:
services:
app:
volumes:
- sqlite-data:/app/data
volumes:
sqlite-data:
driver: local
Tools & Ecosystem
Docker Volume Drivers: Docker supports pluggable volume drivers for various storage backends. The local driver stores volumes on the Docker host filesystem. Cloud provider drivers integrate with AWS EBS, Google Persistent Disk, Azure Disk Storage. Distributed storage drivers connect to systems like GlusterFS, Ceph, or Portworx.
Installing and using a volume driver:
# Install Docker volume plugin
docker plugin install vieux/sshfs
# Create volume using plugin
docker volume create --driver vieux/sshfs \
-o sshcmd=user@server:/remote/path \
-o password=secret \
ssh-volume
# Use in container
docker run -v ssh-volume:/data myapp:latest
Kubernetes Storage Ecosystem: Kubernetes integrates storage systems through CSI (Container Storage Interface) drivers. CSI provides a standard interface for storage system integration, replacing earlier in-tree volume plugins. Major storage vendors and projects provide CSI drivers: AWS EBS CSI, Google Cloud Persistent Disk CSI, Azure Disk CSI, Ceph CSI, NFS CSI.
Storage class definition using CSI driver:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast-ssd
provisioner: ebs.csi.aws.com
parameters:
type: gp3
iopsPerGB: "100"
encrypted: "true"
volumeBindingMode: WaitForFirstConsumer
Local Path Provisioner: The Rancher local-path-provisioner dynamically provisions local storage on Kubernetes nodes. This tool suits development environments and stateful applications requiring node-local storage with dynamic provisioning:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: local-path-pvc
spec:
accessModes:
- ReadWriteOnce
storageClassName: local-path
resources:
requests:
storage: 10Gi
NFS Server Provisioner: The NFS server provisioner creates NFS exports for persistent volumes, enabling ReadWriteMany access patterns without external NFS infrastructure:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: nfs
provisioner: cluster.local/nfs-server-provisioner
parameters:
archiveOnDelete: "false"
Rook Storage Orchestrator: Rook deploys and manages distributed storage systems (Ceph, NFS, Cassandra) on Kubernetes. Rook automates storage cluster deployment, monitoring, and scaling:
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
name: replicapool
namespace: rook-ceph
spec:
failureDomain: host
replicated:
size: 3
Velero Backup Tool: Velero backs up Kubernetes resources and persistent volumes. The tool integrates with cloud provider snapshot APIs and storage plugins for volume backup:
# Install Velero
velero install --provider aws \
--plugins velero/velero-plugin-for-aws:v1.5.0 \
--bucket velero-backups \
--secret-file ./credentials-velero
# Backup namespace with volumes
velero backup create rails-app-backup \
--include-namespaces rails-production \
--snapshot-volumes
Storage Performance Tools: Tools like fio benchmark storage performance in container environments. Running fio in containers tests actual workload I/O patterns:
docker run --rm -v data-volume:/data \
nixery.dev/shell/fio \
fio --name=randwrite --ioengine=libaio --iodepth=16 \
--rw=randwrite --bs=4k --size=1G --numjobs=4 \
--directory=/data --runtime=60
Real-World Applications
Stateful Ruby Application Deployment: Deploying stateful Rails applications on Kubernetes requires persistent storage for uploaded files, database data, and application caches. A production Rails deployment separates storage concerns by resource type:
apiVersion: apps/v1
kind: Deployment
metadata:
name: rails-app
spec:
replicas: 3
template:
spec:
containers:
- name: rails
image: mycompany/rails-app:v1.2.3
volumeMounts:
- name: uploads
mountPath: /app/storage
- name: logs
mountPath: /app/log
volumes:
- name: uploads
persistentVolumeClaim:
claimName: rails-uploads-pvc
- name: logs
emptyDir: {}
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: rails-uploads-pvc
spec:
accessModes:
- ReadWriteMany
storageClassName: nfs
resources:
requests:
storage: 100Gi
The uploads volume uses ReadWriteMany with NFS storage class since multiple Rails pods handle requests simultaneously. Log volumes use emptyDir (temporary storage) since log aggregation systems collect logs from containers.
Database Container Patterns: Running PostgreSQL in containers for development and testing requires persistent storage for data directories. Production database deployments typically use StatefulSets with dedicated persistent volumes per pod:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
spec:
serviceName: postgres
replicas: 1
template:
spec:
containers:
- name: postgres
image: postgres:14
env:
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-secret
key: password
volumeMounts:
- name: postgres-data
mountPath: /var/lib/postgresql/data
volumeClaimTemplates:
- metadata:
name: postgres-data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: fast-ssd
resources:
requests:
storage: 50Gi
StatefulSets with volumeClaimTemplates create dedicated persistent volumes per pod replica, essential for database clustering and replication scenarios.
Shared Asset Storage: Applications serving static assets from containers often use shared storage for asset files generated during deployment. A deployment pattern mounts shared storage containing compiled assets:
apiVersion: batch/v1
kind: Job
metadata:
name: asset-compilation
spec:
template:
spec:
containers:
- name: compile-assets
image: myapp:latest
command: ["bundle", "exec", "rake", "assets:precompile"]
volumeMounts:
- name: assets
mountPath: /app/public/assets
volumes:
- name: assets
persistentVolumeClaim:
claimName: shared-assets
restartPolicy: OnFailure
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-server
spec:
template:
spec:
containers:
- name: nginx
image: nginx:alpine
volumeMounts:
- name: assets
mountPath: /usr/share/nginx/html/assets
readOnly: true
volumes:
- name: assets
persistentVolumeClaim:
claimName: shared-assets
The compilation job writes assets to shared storage, while web servers mount the same volume read-only for serving.
Log Aggregation Architecture: Container log storage patterns balance retention requirements with storage costs. A common pattern uses emptyDir volumes for container logs with sidecar containers streaming logs to aggregation systems:
apiVersion: v1
kind: Pod
metadata:
name: app-with-logging
spec:
containers:
- name: app
image: myapp:latest
volumeMounts:
- name: logs
mountPath: /app/log
- name: log-shipper
image: fluent/fluent-bit:latest
volumeMounts:
- name: logs
mountPath: /app/log
readOnly: true
- name: fluent-bit-config
mountPath: /fluent-bit/etc
volumes:
- name: logs
emptyDir: {}
- name: fluent-bit-config
configMap:
name: fluent-bit-config
This pattern avoids persistent log storage while ensuring log delivery to centralized systems.
Common Pitfalls
Volume Mount Path Conflicts: Mounting a volume at a path containing existing image data overwrites the image content with volume data. If an image contains application code at /app and a volume mounts at /app, the volume content replaces the image's /app directory. This frequently occurs when developers mount source code volumes for development without considering image layer content.
The solution separates volume mount paths from image content paths or uses named volumes initialized from image content. Docker supports volume initialization from image data, but Kubernetes requires init containers to populate volumes from image layers.
Permission Mismatches: Container processes running as non-root users encounter permission errors when accessing volumes owned by different UIDs. A Rails application running as UID 1000 cannot write to a volume directory owned by root (UID 0). This occurs frequently with local storage where volume directories default to root ownership.
Fixing permissions requires setting security contexts specifying fsGroup for volume ownership or running init containers that correct permissions:
spec:
securityContext:
fsGroup: 1000
containers:
- name: app
securityContext:
runAsUser: 1000
volumeMounts:
- name: data
mountPath: /app/data
Storage Class Defaults: Kubernetes clusters with multiple storage classes require explicit storageClassName in PersistentVolumeClaims. Without explicit specification, the cluster default storage class provisions volumes, which may not match application requirements. A claim requesting high-performance SSD storage might receive slow HDD storage if cluster defaults favor cost over performance.
Volume Cleanup Failures: Docker volume cleanup with docker volume prune removes all unused volumes, potentially deleting data needed for stopped containers intended for restart. Named volumes persist until explicitly deleted, but anonymous volumes created without names disappear during cleanup.
Production environments should exclusively use named volumes with explicit cleanup policies rather than relying on automatic pruning.
ReadWriteMany Assumptions: Assuming all storage systems support ReadWriteMany access mode causes pod scheduling failures. Block storage (AWS EBS, Google Persistent Disk, Azure Disk) only supports ReadWriteOnce—mounting to a single node. Deploying multiple pod replicas with ReadWriteOnce volumes causes pods to remain in pending state when scheduled to different nodes.
Applications requiring shared storage must use network filesystems (NFS, GlusterFS, CephFS) or object storage APIs rather than block storage volumes.
Container Data Growth: Failing to monitor volume usage causes out-of-space errors when applications exceed volume capacity. Container logs written to volumes, uploaded files, cached data, and database growth consume space without automatic cleanup.
Implementing volume monitoring, size limits, and cleanup policies prevents storage exhaustion:
# Regular cleanup of old uploads
class StorageCleanupJob < ApplicationJob
def perform
storage_path = ENV['STORAGE_PATH']
cutoff_time = 90.days.ago
Dir.glob(File.join(storage_path, '**', '*')).each do |path|
next unless File.file?(path)
next if File.mtime(path) > cutoff_time
File.delete(path)
logger.info "Deleted old file: #{path}"
end
end
end
StatefulSet Volume Ordering: StatefulSets create persistent volumes in sequence, but deleting a StatefulSet doesn't automatically delete its persistent volume claims. Recreating a StatefulSet with existing PVCs can bind to volumes containing stale data from previous deployments.
Manual PVC deletion or using volume reclaim policies prevents data leakage between deployments:
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-001
spec:
persistentVolumeReclaimPolicy: Delete # Auto-delete when claim deleted
Reference
Volume Types
| Type | Lifecycle | Persistence | Sharing | Use Case |
|---|---|---|---|---|
| Docker Volume | Managed by daemon | Persistent | Multi-container | Application data storage |
| Bind Mount | Host filesystem | Host-dependent | Multi-container | Development, host file access |
| tmpfs Mount | Container runtime | Ephemeral | Single-container | Sensitive data, caches |
| emptyDir | Pod lifecycle | Pod-scoped | Pod containers | Temporary storage, container communication |
| PersistentVolume | Cluster resource | Persistent | Defined by access mode | Kubernetes stateful applications |
Access Modes
| Mode | Abbreviation | Description | Supported Storage |
|---|---|---|---|
| ReadWriteOnce | RWO | Single node read-write | Block storage, local storage |
| ReadOnlyMany | ROX | Multiple node read-only | Network filesystems, object storage |
| ReadWriteMany | RWX | Multiple node read-write | Network filesystems, distributed storage |
| ReadWriteOncePod | RWOP | Single pod read-write | Block storage with pod isolation |
Docker Volume Commands
| Command | Purpose | Example |
|---|---|---|
| docker volume create | Create named volume | docker volume create mydata |
| docker volume ls | List volumes | docker volume ls --filter dangling=true |
| docker volume inspect | View volume details | docker volume inspect mydata |
| docker volume rm | Delete volume | docker volume rm mydata |
| docker volume prune | Remove unused volumes | docker volume prune --force |
Kubernetes Storage Resources
| Resource | Scope | Purpose | Lifecycle |
|---|---|---|---|
| PersistentVolume | Cluster | Storage capacity definition | Independent |
| PersistentVolumeClaim | Namespace | Storage request | Namespace-bound |
| StorageClass | Cluster | Dynamic provisioning parameters | Independent |
| VolumeSnapshot | Namespace | Point-in-time volume copy | Namespace-bound |
Common Volume Drivers
| Driver | Storage Type | Features | Access Modes |
|---|---|---|---|
| local | Host filesystem | High performance, node-local | RWO |
| nfs | Network filesystem | Multi-node access | RWO, ROX, RWX |
| csi-rbd | Ceph block device | Distributed, replicated | RWO |
| csi-cephfs | Ceph filesystem | Distributed, multi-access | RWO, ROX, RWX |
| aws-ebs | Amazon EBS | Cloud-managed block | RWO |
| gce-pd | Google Persistent Disk | Cloud-managed block | RWO |
Volume Mount Options
| Option | Effect | Example |
|---|---|---|
| readOnly | Mount as read-only | readOnly: true |
| subPath | Mount subdirectory | subPath: uploads/images |
| mountPropagation | Control mount visibility | mountPropagation: Bidirectional |
Storage Performance Comparison
| Storage Type | Latency | Throughput | IOPS | Durability |
|---|---|---|---|---|
| Local SSD | Lowest | Highest | Highest | Node-dependent |
| Network Block | Low | High | High | Replicated |
| Network File | Medium | Medium | Medium | Replicated |
| Object Storage | Highest | Variable | Lowest | Highest |
Reclaim Policies
| Policy | Behavior | Use Case |
|---|---|---|
| Retain | Manual cleanup required | Production data protection |
| Delete | Automatic deletion | Development, ephemeral data |
| Recycle | Deprecated | Legacy systems only |
Security Context Fields
| Field | Purpose | Example |
|---|---|---|
| fsGroup | Volume ownership GID | fsGroup: 1000 |
| runAsUser | Container process UID | runAsUser: 1000 |
| runAsGroup | Container process GID | runAsGroup: 1000 |
| fsGroupChangePolicy | Permission change behavior | fsGroupChangePolicy: OnRootMismatch |