CrackedRuby logo

CrackedRuby

PStore

Overview

PStore implements a transactional data store that persists Ruby objects to files using Marshal serialization. The PStore class wraps file operations in transactions that ensure data consistency and provide atomic operations. Ruby creates a lock file during transactions to prevent concurrent access and maintains data integrity through commit and rollback mechanisms.

The core PStore API centers around transactions that group read and write operations. Opening a transaction provides access to the data store, while committing writes changes to disk. PStore automatically handles file locking, marshalling objects, and managing the underlying file structure.

require 'pstore'

store = PStore.new('config.pstore')

# Write data in a transaction
store.transaction do |s|
  s['app_name'] = 'MyApplication' 
  s['version'] = '2.1.0'
  s['settings'] = { theme: 'dark', debug: true }
end

# Read data in another transaction
store.transaction(read_only: true) do |s|
  puts s['app_name']    # => "MyApplication"
  puts s['version']     # => "2.1.0" 
  puts s['settings']    # => {:theme=>"dark", :debug=>true}
end

PStore stores data as key-value pairs where keys are typically strings or symbols and values can be any object that responds to Marshal serialization. The underlying storage uses Ruby's Marshal format to serialize objects, making PStore suitable for storing complex data structures while maintaining object relationships and types.

store = PStore.new('data.pstore')

store.transaction do |s|
  s['users'] = [
    { id: 1, name: 'Alice', created_at: Time.now },
    { id: 2, name: 'Bob', created_at: Time.now - 3600 }
  ]
  s['counters'] = { visits: 1250, downloads: 89 }
  s['config'] = OpenStruct.new(max_connections: 100, timeout: 30)
end

Basic Usage

PStore transactions operate in two modes: read-write (default) and read-only. Read-write transactions acquire an exclusive lock and allow modifications, while read-only transactions can run concurrently and provide consistent snapshots of the data.

require 'pstore'

# Initialize with file path
store = PStore.new('inventory.pstore')

# Basic write operations
store.transaction do |s|
  s['products'] = {}
  s['products']['laptop'] = { 
    price: 999.99, 
    quantity: 15,
    category: 'electronics'
  }
  s['products']['book'] = {
    price: 24.95,
    quantity: 200, 
    category: 'books'
  }
  s['last_updated'] = Time.now
end

Reading data requires wrapping operations in transactions to ensure consistency. PStore returns the exact objects that were stored, maintaining their types and structure.

# Read operations
store.transaction(read_only: true) do |s|
  products = s['products']
  
  products.each do |name, details|
    puts "#{name}: $#{details[:price]} (#{details[:quantity]} available)"
  end
  
  puts "Last updated: #{s['last_updated']}"
end

PStore provides several methods for examining and manipulating the key space. The keys method returns all top-level keys, while key? checks for key existence.

store.transaction do |s|
  puts "Store contains: #{s.keys.join(', ')}"
  
  if s.key?('products')
    s['products']['tablet'] = { price: 399.99, quantity: 8 }
  end
  
  # Delete keys
  s.delete('old_data') if s.key?('old_data')
end

Nested data structures require careful handling during updates. Since PStore stores object references, modifying nested structures outside of transactions won't persist changes.

# Incorrect - changes won't persist
store.transaction(read_only: true) do |s|
  product = s['products']['laptop']
end
product[:quantity] = 10  # This change is lost

# Correct - modify within transaction  
store.transaction do |s|
  s['products']['laptop'][:quantity] = 10
  # Or reassign the entire structure
  products = s['products']
  products['laptop'][:quantity] = 10
  s['products'] = products
end

Error Handling & Debugging

PStore operations can fail due to file system issues, permission problems, or data corruption. The most common exceptions occur during file operations and transaction management.

File permission errors happen when the Ruby process lacks read or write access to the PStore file or its directory.

store = PStore.new('/root/restricted.pstore')

begin
  store.transaction do |s|
    s['data'] = 'test'
  end
rescue Errno::EACCES => e
  puts "Permission denied: #{e.message}"
  # Handle by changing file location or fixing permissions
  alternative_store = PStore.new(File.join(Dir.home, 'app_data.pstore'))
end

rescue Errno::ENOENT => e
  puts "Directory doesn't exist: #{e.message}"
  # Create directory structure
  FileUtils.mkdir_p(File.dirname(store_path))
  retry
end

Disk space exhaustion during write operations can corrupt the PStore file or leave it in an inconsistent state. PStore attempts to maintain atomicity by writing to a temporary file first.

def safe_pstore_write(store, data)
  begin
    store.transaction do |s|
      s.merge!(data)
    end
  rescue Errno::ENOSPC => e
    logger.error "Disk full while writing to PStore: #{e.message}"
    # Check available space
    stat = File.statvfs(File.dirname(store.path))
    available_mb = (stat.bavail * stat.frsize) / (1024 * 1024)
    
    if available_mb < 100
      clean_old_files
      retry
    else
      raise
    end
  end
end

Object serialization failures occur when PStore cannot marshal certain objects. Custom classes without proper serialization support, objects containing file handles, or circular references can cause TypeError or ArgumentError.

class CustomObject  
  def initialize(data)
    @data = data
    @file_handle = File.open('/dev/null')  # Problematic
  end
end

store = PStore.new('test.pstore')

begin
  store.transaction do |s|
    s['object'] = CustomObject.new("test")
  end
rescue TypeError => e
  puts "Serialization failed: #{e.message}"
  
  # Solution: implement custom serialization
  class CustomObject
    def marshal_dump
      [@data]  # Only serialize safe attributes
    end
    
    def marshal_load(data)
      @data = data[0]
      @file_handle = File.open('/dev/null')  # Recreate after loading
    end
  end
end

Transaction conflicts arise when attempting nested transactions or mixing read-only and read-write operations incorrectly.

# This will raise an exception
store.transaction do |s|
  s['outer'] = 'value'
  
  # Nested transaction attempt
  begin
    store.transaction do |inner|
      inner['nested'] = 'invalid'  # Raises error
    end
  rescue RuntimeError => e
    puts "Nested transaction error: #{e.message}"
  end
end

# Proper pattern for complex operations
def update_with_validation(store, updates)
  # Read current state
  current_data = nil
  store.transaction(read_only: true) do |s|
    current_data = s.keys.map { |k| [k, s[k]] }.to_h
  end
  
  # Validate changes
  validated_updates = validate_updates(current_data, updates)
  
  # Apply changes
  store.transaction do |s|
    validated_updates.each { |k, v| s[k] = v }
  end
end

Thread Safety & Concurrency

PStore provides thread safety through file locking mechanisms, but concurrent access patterns require careful consideration. Read-only transactions can execute concurrently, while read-write transactions acquire exclusive locks.

Multiple processes can safely access the same PStore file through the built-in locking mechanism. Ruby creates a lock file (.pstore file with .lock extension) during read-write transactions.

# Safe concurrent read access
threads = 10.times.map do |i|
  Thread.new do
    store = PStore.new('shared.pstore')
    
    store.transaction(read_only: true) do |s|
      data = s['shared_counter'] || 0
      puts "Thread #{i} read: #{data}"
      sleep(rand(0.1..0.3))  # Simulate processing
    end
  end
end

threads.each(&:join)

Write operations serialize access automatically, but applications should minimize transaction duration to reduce contention.

# Poor pattern - long transaction holds lock
store.transaction do |s|
  s['start_time'] = Time.now
  
  # Expensive operation inside transaction
  (1..1000).each do |i|
    s["item_#{i}"] = process_item(i)  # Blocks other writers
  end
  
  s['end_time'] = Time.now
end

# Better pattern - prepare data outside transaction
processed_data = {}
(1..1000).each do |i|
  processed_data["item_#{i}"] = process_item(i)
end

# Quick write operation
store.transaction do |s|
  s['start_time'] = Time.now
  s.merge!(processed_data)
  s['end_time'] = Time.now
end

Deadlock situations can occur when multiple processes attempt to acquire locks on different PStore files in different orders.

# Deadlock risk with multiple stores
def risky_multi_store_update(store1, store2, data)
  Thread.new do
    store1.transaction do |s1|
      s1['data'] = data[:first]
      
      # Another process might lock store2 first
      store2.transaction do |s2|
        s2['data'] = data[:second]  # Potential deadlock
      end
    end
  end
end

# Safe pattern - consistent ordering
def safe_multi_store_update(stores, data)
  # Sort stores by path to ensure consistent lock ordering
  ordered_stores = stores.sort_by(&:path)
  
  ordered_stores.each_with_index do |store, index|
    store.transaction do |s|
      s.merge!(data[index])
    end
  end
end

Reader-writer coordination requires understanding PStore's locking behavior. Read-only transactions don't block each other but block on active write transactions.

class ConcurrentDataStore
  def initialize(path)
    @store = PStore.new(path)
    @read_mutex = Mutex.new  # Optional: coordinate readers
  end
  
  def bulk_read(keys)
    @store.transaction(read_only: true) do |s|
      keys.map { |key| [key, s[key]] }.to_h
    end
  end
  
  def conditional_write(key, value, &condition)
    @store.transaction do |s|
      current = s[key]
      if condition.call(current)
        s[key] = value
        true
      else
        false
      end
    end
  end
  
  def atomic_increment(key, amount = 1)
    @store.transaction do |s|
      current = s[key] || 0
      s[key] = current + amount
    end
  end
end

# Usage with proper error handling
data_store = ConcurrentDataStore.new('metrics.pstore')

# Multiple threads can read simultaneously
readers = 5.times.map do
  Thread.new { data_store.bulk_read(['visits', 'downloads']) }
end

# Writers serialize automatically
writers = 3.times.map do
  Thread.new { data_store.atomic_increment('visits') }
end

(readers + writers).each(&:join)

Performance & Memory

PStore performance characteristics depend on file size, object complexity, and access patterns. The entire file gets loaded into memory during transactions, making file size a critical factor for performance and memory usage.

File size directly impacts transaction startup time since PStore reads the entire file when beginning a transaction. Large files can cause significant memory consumption and slower response times.

require 'benchmark'
require 'pstore'

# Measure impact of file size on transaction time
def benchmark_pstore_sizes
  sizes = [1000, 10_000, 100_000]
  
  sizes.each do |size|
    store = PStore.new("test_#{size}.pstore")
    
    # Create test data
    store.transaction do |s|
      size.times { |i| s["key_#{i}"] = "value_#{i}" * 100 }
    end
    
    # Measure read performance
    time = Benchmark.realtime do
      store.transaction(read_only: true) do |s|
        s['key_100']  # Simple read operation
      end
    end
    
    file_size = File.size("test_#{size}.pstore") / 1024.0 / 1024.0
    puts "#{size} keys: #{file_size.round(2)}MB file, #{(time * 1000).round(2)}ms transaction"
  end
end

Object serialization overhead varies significantly based on object complexity. Simple objects marshal quickly, while complex nested structures or large arrays consume more processing time and memory.

# Efficient storage patterns
store = PStore.new('optimized.pstore')

# Store flat structures when possible
store.transaction do |s|
  s['user_names'] = users.map(&:name)  # Simple array
  s['user_emails'] = users.map(&:email)
  s['user_count'] = users.length
end

# Less efficient - complex nested objects
store.transaction do |s|
  s['users'] = users.map do |user|
    {
      name: user.name,
      email: user.email,
      profile: user.profile.to_h,  # Potentially large nested hash
      permissions: user.permissions.to_a,
      audit_log: user.audit_entries.map(&:to_h)  # Very expensive to serialize
    }
  end
end

Memory usage patterns show that PStore holds the entire data structure in memory during transactions. Applications should monitor memory consumption when dealing with large datasets.

class MemoryEfficientPStore
  def initialize(path, max_memory_mb: 50)
    @store = PStore.new(path)
    @max_memory_bytes = max_memory_mb * 1024 * 1024
  end
  
  def write_with_memory_check(data)
    # Estimate serialized size
    estimated_size = Marshal.dump(data).size
    
    if estimated_size > @max_memory_bytes
      # Split large data into chunks
      chunk_size = data.size / (estimated_size / @max_memory_bytes + 1)
      data.each_slice(chunk_size).with_index do |chunk, index|
        @store.transaction do |s|
          s["chunk_#{index}"] = chunk
        end
      end
    else
      @store.transaction do |s|
        s['data'] = data
      end
    end
  end
  
  def read_chunked_data
    chunks = []
    @store.transaction(read_only: true) do |s|
      index = 0
      while s.key?("chunk_#{index}")
        chunks << s["chunk_#{index}"]
        index += 1
      end
    end
    chunks.flatten
  end
end

Write performance degrades with file size due to the copy-and-rename strategy PStore uses for atomic updates. The operation creates a complete copy of the data file, which can be expensive for large stores.

# Performance monitoring for write operations
class MonitoredPStore
  def initialize(path)
    @store = PStore.new(path)
    @write_times = []
  end
  
  def timed_transaction(&block)
    start_time = Time.now
    
    result = @store.transaction(&block)
    
    duration = Time.now - start_time
    @write_times << duration
    
    if @write_times.length > 10
      avg_time = @write_times.sum / @write_times.length
      if avg_time > 1.0  # More than 1 second average
        warn "PStore write performance degraded: #{avg_time.round(3)}s average"
      end
      @write_times.clear
    end
    
    result
  end
  
  def performance_stats
    file_size = File.exist?(@store.path) ? File.size(@store.path) : 0
    {
      file_size_mb: file_size / 1024.0 / 1024.0,
      recent_write_times: @write_times
    }
  end
end

Common Pitfalls

Transaction boundaries create the most common source of errors in PStore usage. Operations outside transactions don't persist, and forgetting to commit changes leads to data loss.

store = PStore.new('pitfall.pstore')

# Pitfall: Modifying objects retrieved from PStore
users = nil
store.transaction(read_only: true) do |s|
  users = s['users'] || []
end

users << { name: 'New User' }  # This change won't persist!

# Correct approach - modify within transaction
store.transaction do |s|
  users = s['users'] || []
  users << { name: 'New User' }
  s['users'] = users
end

Object reference semantics can cause unexpected behavior when the same object appears multiple times in the data structure. PStore preserves object identity within a single transaction but not across transactions.

# Pitfall: Assuming object identity persists
shared_config = { theme: 'dark' }

store.transaction do |s|
  s['user1'] = { name: 'Alice', config: shared_config }
  s['user2'] = { name: 'Bob', config: shared_config }
end

# After reload, objects are separate
store.transaction do |s|
  user1 = s['user1']
  user2 = s['user2']
  
  user1[:config][:theme] = 'light'
  puts user2[:config][:theme]  # Still 'dark' - different object now!
end

# Solution: Manage shared data explicitly
store.transaction do |s|
  s['shared_config'] = { theme: 'dark' }
  s['user1'] = { name: 'Alice', config_ref: 'shared_config' }
  s['user2'] = { name: 'Bob', config_ref: 'shared_config' }
end

File path handling causes issues when relative paths change based on the working directory or when the file location becomes inaccessible.

# Pitfall: Relative paths dependent on working directory
Dir.chdir('/tmp')
store = PStore.new('data.pstore')  # Creates /tmp/data.pstore

Dir.chdir('/home/user')
# Now store operations might fail or create a different file

# Solution: Use absolute paths
require 'pathname'

class SafePStore
  def initialize(relative_path, base_dir: Dir.home)
    @path = Pathname.new(base_dir).join(relative_path).to_s
    ensure_directory_exists
    @store = PStore.new(@path)
  end
  
  private
  
  def ensure_directory_exists
    dir = File.dirname(@path)
    FileUtils.mkdir_p(dir) unless Dir.exist?(dir)
  end
  
  def method_missing(method, *args, &block)
    @store.send(method, *args, &block)
  end
  
  def respond_to_missing?(method, include_private = false)
    @store.respond_to?(method, include_private)
  end
end

Concurrent access assumptions lead to race conditions when multiple processes modify the same keys without proper coordination.

# Pitfall: Assuming atomic operations
def unsafe_increment(store, key)
  current = nil
  store.transaction(read_only: true) do |s|
    current = s[key] || 0
  end
  
  # Gap here - another process might modify the value
  
  store.transaction do |s|
    s[key] = current + 1  # Race condition!
  end
end

# Solution: Atomic operations within single transaction
def safe_increment(store, key)
  store.transaction do |s|
    current = s[key] || 0
    s[key] = current + 1
  end
end

# For complex conditions, use proper coordination
def conditional_update(store, key, &condition)
  max_retries = 3
  retries = 0
  
  begin
    store.transaction do |s|
      current = s[key]
      new_value = condition.call(current)
      s[key] = new_value if new_value
    end
  rescue => e
    retries += 1
    if retries < max_retries
      sleep(0.01 * retries)  # Exponential backoff
      retry
    else
      raise
    end
  end
end

Marshal serialization limitations affect certain object types. File handles, database connections, and objects with complex internal state don't serialize properly.

# Pitfall: Storing non-serializable objects
class DatabaseService
  def initialize
    @connection = establish_connection  # Can't be marshalled
  end
end

store.transaction do |s|
  begin
    s['service'] = DatabaseService.new
  rescue TypeError => e
    puts "Can't store database service: #{e.message}"
  end
end

# Solution: Store configuration, not state
class DatabaseConfig
  attr_reader :host, :port, :database
  
  def initialize(host:, port:, database:)
    @host, @port, @database = host, port, database
  end
  
  def create_service
    DatabaseService.connect(host: @host, port: @port, database: @database)
  end
end

store.transaction do |s|
  s['db_config'] = DatabaseConfig.new(
    host: 'localhost', 
    port: 5432, 
    database: 'myapp'
  )
end

Reference

PStore Class Methods

Method Parameters Returns Description
PStore.new(file, thread_safe = false) file (String), thread_safe (Boolean) PStore Creates new PStore instance with specified file path

Instance Methods

Method Parameters Returns Description
#transaction(read_only = false, &block) read_only (Boolean), block Object Executes block within transaction context
#[]=(key, value) key (Object), value (Object) Object Sets value for key (only in transactions)
#[](key) key (Object) Object Retrieves value for key (only in transactions)
#delete(key) key (Object) Object Removes key and returns its value
#key?(key) key (Object) Boolean Checks if key exists in store
#keys None Array Returns array of all keys
#length None Integer Returns number of stored keys
#each(&block) Block PStore Iterates over key-value pairs
#fetch(key, default = nil, &block) key (Object), default (Object), block Object Retrieves value with default or block
#store(key, value) key (Object), value (Object) Object Alias for #[]=
#abort None nil Aborts current transaction without saving
#commit None nil Commits current transaction (automatic at block end)

PStore Attributes

Attribute Type Description
path String File path for the PStore data file
ultra_safe Boolean Whether to sync data to disk immediately

Transaction Context Methods

Available only within PStore#transaction blocks:

Method Behavior Notes
#[] Read operation Works in read-only and read-write transactions
#[]= Write operation Only available in read-write transactions
#delete Removal operation Only available in read-write transactions
#key? Existence check Works in all transaction types
#keys Key enumeration Works in all transaction types
#each Iteration Works in all transaction types

Error Hierarchy

StandardError
├── Errno::EACCES      # Permission denied
├── Errno::ENOENT      # File or directory not found  
├── Errno::ENOSPC      # No space left on device
├── TypeError          # Object cannot be marshalled
├── ArgumentError      # Invalid parameters
└── RuntimeError       # Transaction errors (nested transactions)

Transaction Modes

Mode Parameter Lock Type Concurrent Access Use Case
Read-Write read_only: false (default) Exclusive Blocks all other transactions Data modification
Read-Only read_only: true Shared Multiple concurrent readers Data retrieval

File System Behavior

Operation Files Created Atomic Lock File
Read transaction None N/A filename.lock (shared)
Write transaction filename.tmp during commit Yes filename.lock (exclusive)
Failed transaction Temporary files cleaned up Rollback Lock released

Marshal Compatibility

Object Type Serializable Notes
Basic types (String, Integer, etc.) Full support
Arrays, Hashes Nested structures supported
Custom objects Must respond to Marshal dump/load
File handles Use file paths instead
Database connections Store connection parameters
Threads Not serializable
Procs/Lambdas Store as strings or use method objects

Performance Characteristics

File Size Transaction Startup Memory Usage Write Performance
< 1MB < 10ms Low Fast
1-10MB 10-100ms Moderate Moderate
10-100MB 100ms-1s High Slow
> 100MB > 1s Very High Very Slow

Best Practices Summary

  • Keep transactions short to minimize lock contention
  • Use read-only transactions when possible for better concurrency
  • Store simple objects to reduce serialization overhead
  • Monitor file size growth and implement archival strategies
  • Handle file system exceptions appropriately
  • Use absolute file paths to avoid working directory issues
  • Implement proper error handling for concurrent access scenarios
  • Consider PStore alternatives for high-performance or large-dataset requirements