Overview
PStore implements a transactional data store that persists Ruby objects to files using Marshal serialization. The PStore
class wraps file operations in transactions that ensure data consistency and provide atomic operations. Ruby creates a lock file during transactions to prevent concurrent access and maintains data integrity through commit and rollback mechanisms.
The core PStore API centers around transactions that group read and write operations. Opening a transaction provides access to the data store, while committing writes changes to disk. PStore automatically handles file locking, marshalling objects, and managing the underlying file structure.
require 'pstore'
store = PStore.new('config.pstore')
# Write data in a transaction
store.transaction do |s|
s['app_name'] = 'MyApplication'
s['version'] = '2.1.0'
s['settings'] = { theme: 'dark', debug: true }
end
# Read data in another transaction
store.transaction(read_only: true) do |s|
puts s['app_name'] # => "MyApplication"
puts s['version'] # => "2.1.0"
puts s['settings'] # => {:theme=>"dark", :debug=>true}
end
PStore stores data as key-value pairs where keys are typically strings or symbols and values can be any object that responds to Marshal serialization. The underlying storage uses Ruby's Marshal format to serialize objects, making PStore suitable for storing complex data structures while maintaining object relationships and types.
store = PStore.new('data.pstore')
store.transaction do |s|
s['users'] = [
{ id: 1, name: 'Alice', created_at: Time.now },
{ id: 2, name: 'Bob', created_at: Time.now - 3600 }
]
s['counters'] = { visits: 1250, downloads: 89 }
s['config'] = OpenStruct.new(max_connections: 100, timeout: 30)
end
Basic Usage
PStore transactions operate in two modes: read-write (default) and read-only. Read-write transactions acquire an exclusive lock and allow modifications, while read-only transactions can run concurrently and provide consistent snapshots of the data.
require 'pstore'
# Initialize with file path
store = PStore.new('inventory.pstore')
# Basic write operations
store.transaction do |s|
s['products'] = {}
s['products']['laptop'] = {
price: 999.99,
quantity: 15,
category: 'electronics'
}
s['products']['book'] = {
price: 24.95,
quantity: 200,
category: 'books'
}
s['last_updated'] = Time.now
end
Reading data requires wrapping operations in transactions to ensure consistency. PStore returns the exact objects that were stored, maintaining their types and structure.
# Read operations
store.transaction(read_only: true) do |s|
products = s['products']
products.each do |name, details|
puts "#{name}: $#{details[:price]} (#{details[:quantity]} available)"
end
puts "Last updated: #{s['last_updated']}"
end
PStore provides several methods for examining and manipulating the key space. The keys
method returns all top-level keys, while key?
checks for key existence.
store.transaction do |s|
puts "Store contains: #{s.keys.join(', ')}"
if s.key?('products')
s['products']['tablet'] = { price: 399.99, quantity: 8 }
end
# Delete keys
s.delete('old_data') if s.key?('old_data')
end
Nested data structures require careful handling during updates. Since PStore stores object references, modifying nested structures outside of transactions won't persist changes.
# Incorrect - changes won't persist
store.transaction(read_only: true) do |s|
product = s['products']['laptop']
end
product[:quantity] = 10 # This change is lost
# Correct - modify within transaction
store.transaction do |s|
s['products']['laptop'][:quantity] = 10
# Or reassign the entire structure
products = s['products']
products['laptop'][:quantity] = 10
s['products'] = products
end
Error Handling & Debugging
PStore operations can fail due to file system issues, permission problems, or data corruption. The most common exceptions occur during file operations and transaction management.
File permission errors happen when the Ruby process lacks read or write access to the PStore file or its directory.
store = PStore.new('/root/restricted.pstore')
begin
store.transaction do |s|
s['data'] = 'test'
end
rescue Errno::EACCES => e
puts "Permission denied: #{e.message}"
# Handle by changing file location or fixing permissions
alternative_store = PStore.new(File.join(Dir.home, 'app_data.pstore'))
end
rescue Errno::ENOENT => e
puts "Directory doesn't exist: #{e.message}"
# Create directory structure
FileUtils.mkdir_p(File.dirname(store_path))
retry
end
Disk space exhaustion during write operations can corrupt the PStore file or leave it in an inconsistent state. PStore attempts to maintain atomicity by writing to a temporary file first.
def safe_pstore_write(store, data)
begin
store.transaction do |s|
s.merge!(data)
end
rescue Errno::ENOSPC => e
logger.error "Disk full while writing to PStore: #{e.message}"
# Check available space
stat = File.statvfs(File.dirname(store.path))
available_mb = (stat.bavail * stat.frsize) / (1024 * 1024)
if available_mb < 100
clean_old_files
retry
else
raise
end
end
end
Object serialization failures occur when PStore cannot marshal certain objects. Custom classes without proper serialization support, objects containing file handles, or circular references can cause TypeError
or ArgumentError
.
class CustomObject
def initialize(data)
@data = data
@file_handle = File.open('/dev/null') # Problematic
end
end
store = PStore.new('test.pstore')
begin
store.transaction do |s|
s['object'] = CustomObject.new("test")
end
rescue TypeError => e
puts "Serialization failed: #{e.message}"
# Solution: implement custom serialization
class CustomObject
def marshal_dump
[@data] # Only serialize safe attributes
end
def marshal_load(data)
@data = data[0]
@file_handle = File.open('/dev/null') # Recreate after loading
end
end
end
Transaction conflicts arise when attempting nested transactions or mixing read-only and read-write operations incorrectly.
# This will raise an exception
store.transaction do |s|
s['outer'] = 'value'
# Nested transaction attempt
begin
store.transaction do |inner|
inner['nested'] = 'invalid' # Raises error
end
rescue RuntimeError => e
puts "Nested transaction error: #{e.message}"
end
end
# Proper pattern for complex operations
def update_with_validation(store, updates)
# Read current state
current_data = nil
store.transaction(read_only: true) do |s|
current_data = s.keys.map { |k| [k, s[k]] }.to_h
end
# Validate changes
validated_updates = validate_updates(current_data, updates)
# Apply changes
store.transaction do |s|
validated_updates.each { |k, v| s[k] = v }
end
end
Thread Safety & Concurrency
PStore provides thread safety through file locking mechanisms, but concurrent access patterns require careful consideration. Read-only transactions can execute concurrently, while read-write transactions acquire exclusive locks.
Multiple processes can safely access the same PStore file through the built-in locking mechanism. Ruby creates a lock file (.pstore
file with .lock
extension) during read-write transactions.
# Safe concurrent read access
threads = 10.times.map do |i|
Thread.new do
store = PStore.new('shared.pstore')
store.transaction(read_only: true) do |s|
data = s['shared_counter'] || 0
puts "Thread #{i} read: #{data}"
sleep(rand(0.1..0.3)) # Simulate processing
end
end
end
threads.each(&:join)
Write operations serialize access automatically, but applications should minimize transaction duration to reduce contention.
# Poor pattern - long transaction holds lock
store.transaction do |s|
s['start_time'] = Time.now
# Expensive operation inside transaction
(1..1000).each do |i|
s["item_#{i}"] = process_item(i) # Blocks other writers
end
s['end_time'] = Time.now
end
# Better pattern - prepare data outside transaction
processed_data = {}
(1..1000).each do |i|
processed_data["item_#{i}"] = process_item(i)
end
# Quick write operation
store.transaction do |s|
s['start_time'] = Time.now
s.merge!(processed_data)
s['end_time'] = Time.now
end
Deadlock situations can occur when multiple processes attempt to acquire locks on different PStore files in different orders.
# Deadlock risk with multiple stores
def risky_multi_store_update(store1, store2, data)
Thread.new do
store1.transaction do |s1|
s1['data'] = data[:first]
# Another process might lock store2 first
store2.transaction do |s2|
s2['data'] = data[:second] # Potential deadlock
end
end
end
end
# Safe pattern - consistent ordering
def safe_multi_store_update(stores, data)
# Sort stores by path to ensure consistent lock ordering
ordered_stores = stores.sort_by(&:path)
ordered_stores.each_with_index do |store, index|
store.transaction do |s|
s.merge!(data[index])
end
end
end
Reader-writer coordination requires understanding PStore's locking behavior. Read-only transactions don't block each other but block on active write transactions.
class ConcurrentDataStore
def initialize(path)
@store = PStore.new(path)
@read_mutex = Mutex.new # Optional: coordinate readers
end
def bulk_read(keys)
@store.transaction(read_only: true) do |s|
keys.map { |key| [key, s[key]] }.to_h
end
end
def conditional_write(key, value, &condition)
@store.transaction do |s|
current = s[key]
if condition.call(current)
s[key] = value
true
else
false
end
end
end
def atomic_increment(key, amount = 1)
@store.transaction do |s|
current = s[key] || 0
s[key] = current + amount
end
end
end
# Usage with proper error handling
data_store = ConcurrentDataStore.new('metrics.pstore')
# Multiple threads can read simultaneously
readers = 5.times.map do
Thread.new { data_store.bulk_read(['visits', 'downloads']) }
end
# Writers serialize automatically
writers = 3.times.map do
Thread.new { data_store.atomic_increment('visits') }
end
(readers + writers).each(&:join)
Performance & Memory
PStore performance characteristics depend on file size, object complexity, and access patterns. The entire file gets loaded into memory during transactions, making file size a critical factor for performance and memory usage.
File size directly impacts transaction startup time since PStore reads the entire file when beginning a transaction. Large files can cause significant memory consumption and slower response times.
require 'benchmark'
require 'pstore'
# Measure impact of file size on transaction time
def benchmark_pstore_sizes
sizes = [1000, 10_000, 100_000]
sizes.each do |size|
store = PStore.new("test_#{size}.pstore")
# Create test data
store.transaction do |s|
size.times { |i| s["key_#{i}"] = "value_#{i}" * 100 }
end
# Measure read performance
time = Benchmark.realtime do
store.transaction(read_only: true) do |s|
s['key_100'] # Simple read operation
end
end
file_size = File.size("test_#{size}.pstore") / 1024.0 / 1024.0
puts "#{size} keys: #{file_size.round(2)}MB file, #{(time * 1000).round(2)}ms transaction"
end
end
Object serialization overhead varies significantly based on object complexity. Simple objects marshal quickly, while complex nested structures or large arrays consume more processing time and memory.
# Efficient storage patterns
store = PStore.new('optimized.pstore')
# Store flat structures when possible
store.transaction do |s|
s['user_names'] = users.map(&:name) # Simple array
s['user_emails'] = users.map(&:email)
s['user_count'] = users.length
end
# Less efficient - complex nested objects
store.transaction do |s|
s['users'] = users.map do |user|
{
name: user.name,
email: user.email,
profile: user.profile.to_h, # Potentially large nested hash
permissions: user.permissions.to_a,
audit_log: user.audit_entries.map(&:to_h) # Very expensive to serialize
}
end
end
Memory usage patterns show that PStore holds the entire data structure in memory during transactions. Applications should monitor memory consumption when dealing with large datasets.
class MemoryEfficientPStore
def initialize(path, max_memory_mb: 50)
@store = PStore.new(path)
@max_memory_bytes = max_memory_mb * 1024 * 1024
end
def write_with_memory_check(data)
# Estimate serialized size
estimated_size = Marshal.dump(data).size
if estimated_size > @max_memory_bytes
# Split large data into chunks
chunk_size = data.size / (estimated_size / @max_memory_bytes + 1)
data.each_slice(chunk_size).with_index do |chunk, index|
@store.transaction do |s|
s["chunk_#{index}"] = chunk
end
end
else
@store.transaction do |s|
s['data'] = data
end
end
end
def read_chunked_data
chunks = []
@store.transaction(read_only: true) do |s|
index = 0
while s.key?("chunk_#{index}")
chunks << s["chunk_#{index}"]
index += 1
end
end
chunks.flatten
end
end
Write performance degrades with file size due to the copy-and-rename strategy PStore uses for atomic updates. The operation creates a complete copy of the data file, which can be expensive for large stores.
# Performance monitoring for write operations
class MonitoredPStore
def initialize(path)
@store = PStore.new(path)
@write_times = []
end
def timed_transaction(&block)
start_time = Time.now
result = @store.transaction(&block)
duration = Time.now - start_time
@write_times << duration
if @write_times.length > 10
avg_time = @write_times.sum / @write_times.length
if avg_time > 1.0 # More than 1 second average
warn "PStore write performance degraded: #{avg_time.round(3)}s average"
end
@write_times.clear
end
result
end
def performance_stats
file_size = File.exist?(@store.path) ? File.size(@store.path) : 0
{
file_size_mb: file_size / 1024.0 / 1024.0,
recent_write_times: @write_times
}
end
end
Common Pitfalls
Transaction boundaries create the most common source of errors in PStore usage. Operations outside transactions don't persist, and forgetting to commit changes leads to data loss.
store = PStore.new('pitfall.pstore')
# Pitfall: Modifying objects retrieved from PStore
users = nil
store.transaction(read_only: true) do |s|
users = s['users'] || []
end
users << { name: 'New User' } # This change won't persist!
# Correct approach - modify within transaction
store.transaction do |s|
users = s['users'] || []
users << { name: 'New User' }
s['users'] = users
end
Object reference semantics can cause unexpected behavior when the same object appears multiple times in the data structure. PStore preserves object identity within a single transaction but not across transactions.
# Pitfall: Assuming object identity persists
shared_config = { theme: 'dark' }
store.transaction do |s|
s['user1'] = { name: 'Alice', config: shared_config }
s['user2'] = { name: 'Bob', config: shared_config }
end
# After reload, objects are separate
store.transaction do |s|
user1 = s['user1']
user2 = s['user2']
user1[:config][:theme] = 'light'
puts user2[:config][:theme] # Still 'dark' - different object now!
end
# Solution: Manage shared data explicitly
store.transaction do |s|
s['shared_config'] = { theme: 'dark' }
s['user1'] = { name: 'Alice', config_ref: 'shared_config' }
s['user2'] = { name: 'Bob', config_ref: 'shared_config' }
end
File path handling causes issues when relative paths change based on the working directory or when the file location becomes inaccessible.
# Pitfall: Relative paths dependent on working directory
Dir.chdir('/tmp')
store = PStore.new('data.pstore') # Creates /tmp/data.pstore
Dir.chdir('/home/user')
# Now store operations might fail or create a different file
# Solution: Use absolute paths
require 'pathname'
class SafePStore
def initialize(relative_path, base_dir: Dir.home)
@path = Pathname.new(base_dir).join(relative_path).to_s
ensure_directory_exists
@store = PStore.new(@path)
end
private
def ensure_directory_exists
dir = File.dirname(@path)
FileUtils.mkdir_p(dir) unless Dir.exist?(dir)
end
def method_missing(method, *args, &block)
@store.send(method, *args, &block)
end
def respond_to_missing?(method, include_private = false)
@store.respond_to?(method, include_private)
end
end
Concurrent access assumptions lead to race conditions when multiple processes modify the same keys without proper coordination.
# Pitfall: Assuming atomic operations
def unsafe_increment(store, key)
current = nil
store.transaction(read_only: true) do |s|
current = s[key] || 0
end
# Gap here - another process might modify the value
store.transaction do |s|
s[key] = current + 1 # Race condition!
end
end
# Solution: Atomic operations within single transaction
def safe_increment(store, key)
store.transaction do |s|
current = s[key] || 0
s[key] = current + 1
end
end
# For complex conditions, use proper coordination
def conditional_update(store, key, &condition)
max_retries = 3
retries = 0
begin
store.transaction do |s|
current = s[key]
new_value = condition.call(current)
s[key] = new_value if new_value
end
rescue => e
retries += 1
if retries < max_retries
sleep(0.01 * retries) # Exponential backoff
retry
else
raise
end
end
end
Marshal serialization limitations affect certain object types. File handles, database connections, and objects with complex internal state don't serialize properly.
# Pitfall: Storing non-serializable objects
class DatabaseService
def initialize
@connection = establish_connection # Can't be marshalled
end
end
store.transaction do |s|
begin
s['service'] = DatabaseService.new
rescue TypeError => e
puts "Can't store database service: #{e.message}"
end
end
# Solution: Store configuration, not state
class DatabaseConfig
attr_reader :host, :port, :database
def initialize(host:, port:, database:)
@host, @port, @database = host, port, database
end
def create_service
DatabaseService.connect(host: @host, port: @port, database: @database)
end
end
store.transaction do |s|
s['db_config'] = DatabaseConfig.new(
host: 'localhost',
port: 5432,
database: 'myapp'
)
end
Reference
PStore Class Methods
Method | Parameters | Returns | Description |
---|---|---|---|
PStore.new(file, thread_safe = false) |
file (String), thread_safe (Boolean) |
PStore |
Creates new PStore instance with specified file path |
Instance Methods
Method | Parameters | Returns | Description |
---|---|---|---|
#transaction(read_only = false, &block) |
read_only (Boolean), block |
Object | Executes block within transaction context |
#[]=(key, value) |
key (Object), value (Object) |
Object |
Sets value for key (only in transactions) |
#[](key) |
key (Object) |
Object |
Retrieves value for key (only in transactions) |
#delete(key) |
key (Object) |
Object |
Removes key and returns its value |
#key?(key) |
key (Object) |
Boolean |
Checks if key exists in store |
#keys |
None | Array |
Returns array of all keys |
#length |
None | Integer |
Returns number of stored keys |
#each(&block) |
Block | PStore |
Iterates over key-value pairs |
#fetch(key, default = nil, &block) |
key (Object), default (Object), block |
Object |
Retrieves value with default or block |
#store(key, value) |
key (Object), value (Object) |
Object |
Alias for #[]= |
#abort |
None | nil |
Aborts current transaction without saving |
#commit |
None | nil |
Commits current transaction (automatic at block end) |
PStore Attributes
Attribute | Type | Description |
---|---|---|
path |
String |
File path for the PStore data file |
ultra_safe |
Boolean |
Whether to sync data to disk immediately |
Transaction Context Methods
Available only within PStore#transaction
blocks:
Method | Behavior | Notes |
---|---|---|
#[] |
Read operation | Works in read-only and read-write transactions |
#[]= |
Write operation | Only available in read-write transactions |
#delete |
Removal operation | Only available in read-write transactions |
#key? |
Existence check | Works in all transaction types |
#keys |
Key enumeration | Works in all transaction types |
#each |
Iteration | Works in all transaction types |
Error Hierarchy
StandardError
├── Errno::EACCES # Permission denied
├── Errno::ENOENT # File or directory not found
├── Errno::ENOSPC # No space left on device
├── TypeError # Object cannot be marshalled
├── ArgumentError # Invalid parameters
└── RuntimeError # Transaction errors (nested transactions)
Transaction Modes
Mode | Parameter | Lock Type | Concurrent Access | Use Case |
---|---|---|---|---|
Read-Write | read_only: false (default) |
Exclusive | Blocks all other transactions | Data modification |
Read-Only | read_only: true |
Shared | Multiple concurrent readers | Data retrieval |
File System Behavior
Operation | Files Created | Atomic | Lock File |
---|---|---|---|
Read transaction | None | N/A | filename.lock (shared) |
Write transaction | filename.tmp during commit |
Yes | filename.lock (exclusive) |
Failed transaction | Temporary files cleaned up | Rollback | Lock released |
Marshal Compatibility
Object Type | Serializable | Notes |
---|---|---|
Basic types (String, Integer, etc.) | ✓ | Full support |
Arrays, Hashes | ✓ | Nested structures supported |
Custom objects | ✓ | Must respond to Marshal dump/load |
File handles | ✗ | Use file paths instead |
Database connections | ✗ | Store connection parameters |
Threads | ✗ | Not serializable |
Procs/Lambdas | ✗ | Store as strings or use method objects |
Performance Characteristics
File Size | Transaction Startup | Memory Usage | Write Performance |
---|---|---|---|
< 1MB | < 10ms | Low | Fast |
1-10MB | 10-100ms | Moderate | Moderate |
10-100MB | 100ms-1s | High | Slow |
> 100MB | > 1s | Very High | Very Slow |
Best Practices Summary
- Keep transactions short to minimize lock contention
- Use read-only transactions when possible for better concurrency
- Store simple objects to reduce serialization overhead
- Monitor file size growth and implement archival strategies
- Handle file system exceptions appropriately
- Use absolute file paths to avoid working directory issues
- Implement proper error handling for concurrent access scenarios
- Consider PStore alternatives for high-performance or large-dataset requirements