CrackedRuby - Container Storage

Overview

Container storage addresses the fundamental challenge of data persistence in ephemeral container environments. Containers operate as isolated, stateless units by default—when a container stops, all data written to its writable layer disappears. This design aligns with microservices principles but creates problems for applications requiring data persistence: databases, file uploads, logs, application state, and cached data.

Container storage solutions provide mechanisms to preserve data beyond container lifecycles. Three primary approaches exist: volumes managed by the container runtime, bind mounts that map host filesystem paths into containers, and temporary filesystems stored in memory. Each serves distinct use cases with different performance characteristics, security implications, and operational complexity.

The container storage landscape evolved from Docker's initial volume implementation to Kubernetes' sophisticated persistent volume subsystem. Modern container platforms abstract storage provisioning through plugins and drivers, enabling integration with network-attached storage, cloud provider storage services, and distributed filesystems. This abstraction layer separates storage lifecycle from container lifecycle, allowing data to persist across container restarts, redeployments, and cluster migrations.

Ruby applications running in containers face identical storage challenges as applications in other languages: session data persistence, file upload storage, database file management, and log retention. The stateless nature of Ruby web applications (Rails, Sinatra, Hanami) aligns well with container design, but stateful components—Active Storage uploads, Action Cable message persistence, background job queues—require explicit storage strategies.

Key Principles

Container Filesystem Layers: Containers use layered filesystems where image layers stack as read-only, with a thin writable layer added at runtime. Writing to this writable layer creates copy-on-write overhead and data disappears when the container stops. Container storage mechanisms bypass this layered filesystem for performance and persistence.

Storage Lifecycle Independence: Container storage separates data lifecycle from container lifecycle. A volume or persistent volume exists independently—created before containers use it, persisting after containers terminate. This independence enables data sharing between containers, rolling updates without data loss, and backup/restore operations.

Mount Namespaces and Isolation: Containers use mount namespaces to create isolated filesystem views. Storage mounts appear as regular filesystem paths inside containers but map to external storage locations. This abstraction hides the underlying storage implementation from containerized applications—a container accessing /data might read from network storage, local SSD, or cloud object storage.

Volume Drivers and Plugins: Container platforms use driver architecture for storage provisioning. Volume drivers handle storage backend integration—local filesystem, NFS, cloud provider APIs, distributed storage systems. This plugin model enables storage system integration without modifying the container runtime.

Access Modes and Concurrency: Storage systems define access modes controlling concurrent access patterns. ReadWriteOnce limits mounting to a single node, ReadOnlyMany allows multiple readers, ReadWriteMany permits concurrent read/write access. These modes reflect underlying storage system capabilities—block storage typically supports single-writer scenarios while network filesystems handle concurrent access.

Storage Classes and Dynamic Provisioning: Modern container orchestrators introduce storage classes that define storage provisioning parameters. A storage class specifies the volume driver, performance characteristics (IOPS, throughput), replication settings, and backup policies. Dynamic provisioning creates volumes on-demand based on storage class definitions, eliminating manual volume creation.

Data Locality and Performance: Storage location relative to compute resources affects performance significantly. Local volumes provide lowest latency but lack portability. Network-attached storage introduces latency but enables data sharing and migration. The locality/portability tradeoff shapes storage architecture decisions.

Ownership and Permissions: Container storage must handle user ID mapping between containers and host systems. Containers often run as non-root users for security, but volume data may be owned by different UIDs. Storage drivers and security contexts manage permission mapping to prevent access violations.

Implementation Approaches

Volume-Based Storage: Volumes represent the primary storage abstraction in container environments. The container runtime manages volume lifecycle, creation, and cleanup. Volumes exist as directories on the host system (for local volumes) or as connections to external storage systems (for plugin-managed volumes). Creating a volume establishes a named storage resource that containers reference by name rather than path.

Docker volumes operate through the volume API, with the Docker daemon managing volume storage in /var/lib/docker/volumes by default. When a container mounts a volume, Docker ensures the volume exists, creates it if necessary, and mounts it at the specified container path. Multiple containers can mount the same volume simultaneously if the underlying storage supports concurrent access.

Kubernetes persistent volumes implement a more sophisticated model with separation between cluster-wide storage resources (PersistentVolumes) and namespace-scoped storage requests (PersistentVolumeClaims). Administrators define PersistentVolumes representing available storage capacity. Developers create PersistentVolumeClaims requesting storage with specific characteristics. The Kubernetes control plane binds claims to volumes matching the requirements.

Bind Mount Strategy: Bind mounts map host filesystem paths directly into containers. This approach provides direct access to host data without abstraction layers. Bind mounts suit development environments where source code on the host mounts into containers for live reloading, and production scenarios requiring access to specific host paths like Unix sockets or device files.

The bind mount mechanism operates through the container runtime's mount namespace manipulation. When starting a container with a bind mount, the runtime creates a mount point in the container's filesystem namespace pointing to the host path. Changes made through either the host path or container path appear immediately in both views—they reference identical underlying storage.

Bind mounts introduce security considerations since they expose host filesystem paths to containers. A compromised container with write access to a bind-mounted host directory can modify host system files. This risk drives recommendations to use bind mounts sparingly in production, preferring volumes with controlled scope.

Temporary Filesystem Pattern: Tmpfs mounts create in-memory filesystems within containers, storing data in RAM rather than persistent storage. This approach provides maximum performance for temporary data, sensitive information that shouldn't persist, or applications requiring fast random access to working data. Tmpfs storage disappears when the container stops, making it unsuitable for persistent data but ideal for caches, temporary files, and runtime state.

Container runtimes implement tmpfs mounts using the host kernel's tmpfs filesystem type. The mount consumes host memory, so tmpfs sizing must account for total container memory limits. Large tmpfs mounts can cause memory pressure, triggering container eviction in orchestrated environments.

Persistent Volume Claim Workflow: Kubernetes' dynamic provisioning implements an automated storage workflow. Developers create PersistentVolumeClaims specifying storage requirements—size, access mode, storage class. The PersistentVolume controller watches for unbound claims and triggers volume provisioning through the specified storage class provisioner. The provisioner contacts the storage backend (cloud API, storage array, distributed system), creates the volume, and generates a PersistentVolume resource. The controller binds the claim to the volume, making it available for pod mounting.

This workflow abstracts storage complexity from application developers. A claim for "100GB of SSD storage with ReadWriteOnce access" triggers appropriate provisioning without developer knowledge of storage backend details, API credentials, or provisioning procedures.

Ruby Implementation

Ruby applications interact with container storage through standard file I/O operations—the storage mount mechanism remains transparent to application code. However, Ruby applications must handle storage-related concerns: file permissions, concurrent access, error handling for storage failures, and configuration of storage paths.

Rails applications commonly use Active Storage for file uploads. Configuring Active Storage in containerized environments requires selecting appropriate storage services and configuring volume mounts for disk-based storage:

# config/storage.yml
local:
  service: Disk
  root: <%= ENV['STORAGE_PATH'] || Rails.root.join('storage') %>

production:
  service: Disk
  root: /app/storage  # Mounted volume path

The container configuration mounts a persistent volume at /app/storage, ensuring uploaded files persist across container restarts:

# docker-compose.yml
services:
  web:
    image: myapp:latest
    volumes:
      - upload-storage:/app/storage
    environment:
      STORAGE_PATH: /app/storage

volumes:
  upload-storage:
    driver: local

Ruby applications writing logs to container storage should configure log paths to mounted volumes. The standard Logger class writes to specified file paths:

# config/environments/production.rb
config.logger = Logger.new('/app/logs/production.log')

Corresponding volume mount ensures log persistence:

volumes:
  - log-storage:/app/logs

Background job processing with Sidekiq requires persistent Redis data when running Redis in containers. The Redis container mounts a volume for data directory persistence:

# Docker Compose configuration for Rails + Redis
services:
  redis:
    image: redis:7
    volumes:
      - redis-data:/data
    command: redis-server --appendonly yes

  sidekiq:
    image: myapp:latest
    command: bundle exec sidekiq
    depends_on:
      - redis
    environment:
      REDIS_URL: redis://redis:6379/0

volumes:
  redis-data:
    driver: local

Handling storage errors in Ruby requires catching I/O exceptions and implementing appropriate fallback behavior:

class DocumentStorage
  def save_document(content, filename)
    path = File.join(storage_path, filename)
    File.write(path, content)
  rescue Errno::ENOSPC
    # Volume full - implement cleanup or alerting
    logger.error "Storage volume full: #{storage_path}"
    raise StorageError, "Insufficient storage space"
  rescue Errno::EACCES
    # Permission denied - check volume mount permissions
    logger.error "Permission denied writing to #{path}"
    raise StorageError, "Storage permission error"
  rescue SystemCallError => e
    # Other filesystem errors
    logger.error "Storage error: #{e.message}"
    raise StorageError, "Storage operation failed"
  end

  private

  def storage_path
    ENV.fetch('STORAGE_PATH', '/app/storage')
  end
end

Ruby applications using SQLite databases in containers must mount database file locations as volumes. SQLite's single-file database aligns well with container storage patterns:

# config/database.yml
production:
  adapter: sqlite3
  database: /app/data/production.sqlite3
  pool: 5
  timeout: 5000

Container configuration with database volume:

services:
  app:
    volumes:
      - sqlite-data:/app/data

volumes:
  sqlite-data:
    driver: local

Tools & Ecosystem

Docker Volume Drivers: Docker supports pluggable volume drivers for various storage backends. The local driver stores volumes on the Docker host filesystem. Cloud provider drivers integrate with AWS EBS, Google Persistent Disk, Azure Disk Storage. Distributed storage drivers connect to systems like GlusterFS, Ceph, or Portworx.

Installing and using a volume driver:

# Install Docker volume plugin
docker plugin install vieux/sshfs

# Create volume using plugin
docker volume create --driver vieux/sshfs \
  -o sshcmd=user@server:/remote/path \
  -o password=secret \
  ssh-volume

# Use in container
docker run -v ssh-volume:/data myapp:latest

Kubernetes Storage Ecosystem: Kubernetes integrates storage systems through CSI (Container Storage Interface) drivers. CSI provides a standard interface for storage system integration, replacing earlier in-tree volume plugins. Major storage vendors and projects provide CSI drivers: AWS EBS CSI, Google Cloud Persistent Disk CSI, Azure Disk CSI, Ceph CSI, NFS CSI.

Storage class definition using CSI driver:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  iopsPerGB: "100"
  encrypted: "true"
volumeBindingMode: WaitForFirstConsumer

Local Path Provisioner: The Rancher local-path-provisioner dynamically provisions local storage on Kubernetes nodes. This tool suits development environments and stateful applications requiring node-local storage with dynamic provisioning:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: local-path-pvc
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: local-path
  resources:
    requests:
      storage: 10Gi

NFS Server Provisioner: The NFS server provisioner creates NFS exports for persistent volumes, enabling ReadWriteMany access patterns without external NFS infrastructure:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: nfs
provisioner: cluster.local/nfs-server-provisioner
parameters:
  archiveOnDelete: "false"

Rook Storage Orchestrator: Rook deploys and manages distributed storage systems (Ceph, NFS, Cassandra) on Kubernetes. Rook automates storage cluster deployment, monitoring, and scaling:

apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
  name: replicapool
  namespace: rook-ceph
spec:
  failureDomain: host
  replicated:
    size: 3

Velero Backup Tool: Velero backs up Kubernetes resources and persistent volumes. The tool integrates with cloud provider snapshot APIs and storage plugins for volume backup:

# Install Velero
velero install --provider aws \
  --plugins velero/velero-plugin-for-aws:v1.5.0 \
  --bucket velero-backups \
  --secret-file ./credentials-velero

# Backup namespace with volumes
velero backup create rails-app-backup \
  --include-namespaces rails-production \
  --snapshot-volumes

Storage Performance Tools: Tools like fio benchmark storage performance in container environments. Running fio in containers tests actual workload I/O patterns:

docker run --rm -v data-volume:/data \
  nixery.dev/shell/fio \
  fio --name=randwrite --ioengine=libaio --iodepth=16 \
  --rw=randwrite --bs=4k --size=1G --numjobs=4 \
  --directory=/data --runtime=60

Real-World Applications

Stateful Ruby Application Deployment: Deploying stateful Rails applications on Kubernetes requires persistent storage for uploaded files, database data, and application caches. A production Rails deployment separates storage concerns by resource type:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: rails-app
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: rails
        image: mycompany/rails-app:v1.2.3
        volumeMounts:
        - name: uploads
          mountPath: /app/storage
        - name: logs
          mountPath: /app/log
      volumes:
      - name: uploads
        persistentVolumeClaim:
          claimName: rails-uploads-pvc
      - name: logs
        emptyDir: {}
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: rails-uploads-pvc
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: nfs
  resources:
    requests:
      storage: 100Gi

The uploads volume uses ReadWriteMany with NFS storage class since multiple Rails pods handle requests simultaneously. Log volumes use emptyDir (temporary storage) since log aggregation systems collect logs from containers.

Database Container Patterns: Running PostgreSQL in containers for development and testing requires persistent storage for data directories. Production database deployments typically use StatefulSets with dedicated persistent volumes per pod:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
spec:
  serviceName: postgres
  replicas: 1
  template:
    spec:
      containers:
      - name: postgres
        image: postgres:14
        env:
        - name: POSTGRES_PASSWORD
          valueFrom:
            secretKeyRef:
              name: postgres-secret
              key: password
        volumeMounts:
        - name: postgres-data
          mountPath: /var/lib/postgresql/data
  volumeClaimTemplates:
  - metadata:
      name: postgres-data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: fast-ssd
      resources:
        requests:
          storage: 50Gi

StatefulSets with volumeClaimTemplates create dedicated persistent volumes per pod replica, essential for database clustering and replication scenarios.

Shared Asset Storage: Applications serving static assets from containers often use shared storage for asset files generated during deployment. A deployment pattern mounts shared storage containing compiled assets:

apiVersion: batch/v1
kind: Job
metadata:
  name: asset-compilation
spec:
  template:
    spec:
      containers:
      - name: compile-assets
        image: myapp:latest
        command: ["bundle", "exec", "rake", "assets:precompile"]
        volumeMounts:
        - name: assets
          mountPath: /app/public/assets
      volumes:
      - name: assets
        persistentVolumeClaim:
          claimName: shared-assets
      restartPolicy: OnFailure
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-server
spec:
  template:
    spec:
      containers:
      - name: nginx
        image: nginx:alpine
        volumeMounts:
        - name: assets
          mountPath: /usr/share/nginx/html/assets
          readOnly: true
      volumes:
      - name: assets
        persistentVolumeClaim:
          claimName: shared-assets

The compilation job writes assets to shared storage, while web servers mount the same volume read-only for serving.

Log Aggregation Architecture: Container log storage patterns balance retention requirements with storage costs. A common pattern uses emptyDir volumes for container logs with sidecar containers streaming logs to aggregation systems:

apiVersion: v1
kind: Pod
metadata:
  name: app-with-logging
spec:
  containers:
  - name: app
    image: myapp:latest
    volumeMounts:
    - name: logs
      mountPath: /app/log
  - name: log-shipper
    image: fluent/fluent-bit:latest
    volumeMounts:
    - name: logs
      mountPath: /app/log
      readOnly: true
    - name: fluent-bit-config
      mountPath: /fluent-bit/etc
  volumes:
  - name: logs
    emptyDir: {}
  - name: fluent-bit-config
    configMap:
      name: fluent-bit-config

This pattern avoids persistent log storage while ensuring log delivery to centralized systems.

Common Pitfalls

Volume Mount Path Conflicts: Mounting a volume at a path containing existing image data overwrites the image content with volume data. If an image contains application code at /app and a volume mounts at /app, the volume content replaces the image's /app directory. This frequently occurs when developers mount source code volumes for development without considering image layer content.

The solution separates volume mount paths from image content paths or uses named volumes initialized from image content. Docker supports volume initialization from image data, but Kubernetes requires init containers to populate volumes from image layers.

Permission Mismatches: Container processes running as non-root users encounter permission errors when accessing volumes owned by different UIDs. A Rails application running as UID 1000 cannot write to a volume directory owned by root (UID 0). This occurs frequently with local storage where volume directories default to root ownership.

Fixing permissions requires setting security contexts specifying fsGroup for volume ownership or running init containers that correct permissions:

spec:
  securityContext:
    fsGroup: 1000
  containers:
  - name: app
    securityContext:
      runAsUser: 1000
    volumeMounts:
    - name: data
      mountPath: /app/data

Storage Class Defaults: Kubernetes clusters with multiple storage classes require explicit storageClassName in PersistentVolumeClaims. Without explicit specification, the cluster default storage class provisions volumes, which may not match application requirements. A claim requesting high-performance SSD storage might receive slow HDD storage if cluster defaults favor cost over performance.

Volume Cleanup Failures: Docker volume cleanup with docker volume prune removes all unused volumes, potentially deleting data needed for stopped containers intended for restart. Named volumes persist until explicitly deleted, but anonymous volumes created without names disappear during cleanup.

Production environments should exclusively use named volumes with explicit cleanup policies rather than relying on automatic pruning.

ReadWriteMany Assumptions: Assuming all storage systems support ReadWriteMany access mode causes pod scheduling failures. Block storage (AWS EBS, Google Persistent Disk, Azure Disk) only supports ReadWriteOnce—mounting to a single node. Deploying multiple pod replicas with ReadWriteOnce volumes causes pods to remain in pending state when scheduled to different nodes.

Applications requiring shared storage must use network filesystems (NFS, GlusterFS, CephFS) or object storage APIs rather than block storage volumes.

Container Data Growth: Failing to monitor volume usage causes out-of-space errors when applications exceed volume capacity. Container logs written to volumes, uploaded files, cached data, and database growth consume space without automatic cleanup.

Implementing volume monitoring, size limits, and cleanup policies prevents storage exhaustion:

# Regular cleanup of old uploads
class StorageCleanupJob < ApplicationJob
  def perform
    storage_path = ENV['STORAGE_PATH']
    cutoff_time = 90.days.ago
    
    Dir.glob(File.join(storage_path, '**', '*')).each do |path|
      next unless File.file?(path)
      next if File.mtime(path) > cutoff_time
      
      File.delete(path)
      logger.info "Deleted old file: #{path}"
    end
  end
end

StatefulSet Volume Ordering: StatefulSets create persistent volumes in sequence, but deleting a StatefulSet doesn't automatically delete its persistent volume claims. Recreating a StatefulSet with existing PVCs can bind to volumes containing stale data from previous deployments.

Manual PVC deletion or using volume reclaim policies prevents data leakage between deployments:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-001
spec:
  persistentVolumeReclaimPolicy: Delete  # Auto-delete when claim deleted

Reference

Volume Types

Type	Lifecycle	Persistence	Sharing	Use Case
Docker Volume	Managed by daemon	Persistent	Multi-container	Application data storage
Bind Mount	Host filesystem	Host-dependent	Multi-container	Development, host file access
tmpfs Mount	Container runtime	Ephemeral	Single-container	Sensitive data, caches
emptyDir	Pod lifecycle	Pod-scoped	Pod containers	Temporary storage, container communication
PersistentVolume	Cluster resource	Persistent	Defined by access mode	Kubernetes stateful applications

Access Modes

Mode	Abbreviation	Description	Supported Storage
ReadWriteOnce	RWO	Single node read-write	Block storage, local storage
ReadOnlyMany	ROX	Multiple node read-only	Network filesystems, object storage
ReadWriteMany	RWX	Multiple node read-write	Network filesystems, distributed storage
ReadWriteOncePod	RWOP	Single pod read-write	Block storage with pod isolation

Docker Volume Commands

Command	Purpose	Example
docker volume create	Create named volume	docker volume create mydata
docker volume ls	List volumes	docker volume ls --filter dangling=true
docker volume inspect	View volume details	docker volume inspect mydata
docker volume rm	Delete volume	docker volume rm mydata
docker volume prune	Remove unused volumes	docker volume prune --force

Kubernetes Storage Resources

Resource	Scope	Purpose	Lifecycle
PersistentVolume	Cluster	Storage capacity definition	Independent
PersistentVolumeClaim	Namespace	Storage request	Namespace-bound
StorageClass	Cluster	Dynamic provisioning parameters	Independent
VolumeSnapshot	Namespace	Point-in-time volume copy	Namespace-bound

Common Volume Drivers

Driver	Storage Type	Features	Access Modes
local	Host filesystem	High performance, node-local	RWO
nfs	Network filesystem	Multi-node access	RWO, ROX, RWX
csi-rbd	Ceph block device	Distributed, replicated	RWO
csi-cephfs	Ceph filesystem	Distributed, multi-access	RWO, ROX, RWX
aws-ebs	Amazon EBS	Cloud-managed block	RWO
gce-pd	Google Persistent Disk	Cloud-managed block	RWO

Volume Mount Options

Option	Effect	Example
readOnly	Mount as read-only	readOnly: true
subPath	Mount subdirectory	subPath: uploads/images
mountPropagation	Control mount visibility	mountPropagation: Bidirectional

Storage Performance Comparison

Storage Type	Latency	Throughput	IOPS	Durability
Local SSD	Lowest	Highest	Highest	Node-dependent
Network Block	Low	High	High	Replicated
Network File	Medium	Medium	Medium	Replicated
Object Storage	Highest	Variable	Lowest	Highest

Reclaim Policies

Policy	Behavior	Use Case
Retain	Manual cleanup required	Production data protection
Delete	Automatic deletion	Development, ephemeral data
Recycle	Deprecated	Legacy systems only

Security Context Fields

Field	Purpose	Example
fsGroup	Volume ownership GID	fsGroup: 1000
runAsUser	Container process UID	runAsUser: 1000
runAsGroup	Container process GID	runAsGroup: 1000
fsGroupChangePolicy	Permission change behavior	fsGroupChangePolicy: OnRootMismatch

Container Storage