Overview
DBM in Ruby provides a hash-like interface to persistent key-value databases, supporting multiple backend implementations including GDBM, NDBM, and SDBM. The DBM class acts as an abstract interface that automatically selects an available database implementation, typically preferring GDBM when available due to its superior feature set and reliability.
Ruby's DBM implementation maintains database files on disk while providing familiar hash operations for reading and writing data. Keys and values are stored as strings, and the database remains persistent between program executions. DBM databases support concurrent read access but require careful handling for write operations to prevent corruption.
The DBM interface includes three main classes: DBM (the generic interface), GDBM (GNU Database Manager), and SDBM (Simple Database Manager). Each backend has different characteristics regarding file locking, maximum key/value sizes, and concurrent access patterns.
require 'dbm'
# Open or create a database
db = DBM.open('mydata.db')
# Store and retrieve data
db['user:123'] = 'Alice'
db['user:124'] = 'Bob'
puts db['user:123'] # => "Alice"
# Check available keys
db.keys # => ["user:123", "user:124"]
db.close
DBM databases automatically handle file management, creating necessary files when opening a non-existent database and managing internal structures for key indexing and data storage. The interface supports both string and symbol keys, converting symbols to strings internally.
# Using blocks for automatic resource management
DBM.open('temp.db') do |db|
db[:config] = 'production'
db['last_update'] = Time.now.to_s
# Database automatically closed when block exits
end
Most DBM implementations create multiple files on disk, typically including a data file and an index file. GDBM creates a single file containing both data and metadata, while NDBM typically creates .dir
and .pag
files for directory and page information respectively.
Basic Usage
Opening a DBM database requires specifying a filename and optionally providing flags for access mode and file permissions. Ruby creates the database files if they don't exist, using default permissions that can be customized during creation.
require 'dbm'
# Open with default settings (read-write, create if needed)
db = DBM.open('application.db')
# Open with specific flags and permissions
db = DBM.open('secure.db', 0644, File::CREAT | File::RDWR)
# Read-only access to existing database
db = DBM.open('readonly.db', 0644, File::RDONLY)
DBM stores both keys and values as strings, automatically converting other data types through their to_s
method. This string-only storage model requires careful consideration when storing structured data or maintaining data types across program executions.
db = DBM.open('example.db')
# Store different data types (all converted to strings)
db['count'] = 42
db['rate'] = 3.14159
db['active'] = true
# Retrieve and convert back to original types
count = db['count'].to_i # => 42
rate = db['rate'].to_f # => 3.14159
active = db['active'] == 'true' # => true
# Store complex data using serialization
require 'json'
user_data = { name: 'Alice', age: 30, roles: ['admin', 'user'] }
db['user:profile'] = user_data.to_json
# Retrieve and deserialize
retrieved = JSON.parse(db['user:profile'])
# => {"name"=>"Alice", "age"=>30, "roles"=>["admin", "user"]}
Hash-like operations work intuitively with DBM objects, including iteration methods, key existence checks, and bulk operations. The database persists all changes immediately unless transactions are explicitly managed.
db = DBM.open('inventory.db')
# Hash-like assignment and retrieval
db['item:1001'] = 'Widget A'
db['item:1002'] = 'Widget B'
db['stock:1001'] = '25'
# Check for key existence
if db.has_key?('item:1001')
puts "Item 1001: #{db['item:1001']}"
end
# Iterate over all key-value pairs
db.each do |key, value|
puts "#{key}: #{value}"
end
# Get all keys matching a pattern
item_keys = db.keys.select { |key| key.start_with?('item:') }
# Bulk deletion
item_keys.each { |key| db.delete(key) }
# Clear entire database
db.clear
DBM supports several iteration patterns and provides methods for examining database contents without loading all data into memory simultaneously. This capability proves crucial when working with large databases that exceed available RAM.
# Iterate keys only (memory efficient for large databases)
db.each_key do |key|
process_key(key) if key.match?(/^session:/)
end
# Iterate values only
db.each_value do |value|
total += value.to_i if value.match?(/^\d+$/)
end
# Select subset of data based on criteria
recent_sessions = {}
db.each_pair do |key, value|
if key.start_with?('session:') && recent?(value)
recent_sessions[key] = value
end
end
Error Handling & Debugging
DBM operations can fail due to file system issues, permission problems, database corruption, or concurrent access conflicts. Ruby raises specific exceptions for different error conditions, requiring comprehensive error handling for robust applications.
File-related errors occur most commonly when opening databases, particularly in production environments where file permissions, disk space, or file system mounting issues can prevent database access.
require 'dbm'
begin
db = DBM.open('/var/lib/myapp/data.db', 0644, File::CREAT | File::RDWR)
rescue Errno::EACCES
# Permission denied - check file/directory permissions
puts "Cannot access database: permission denied"
# Attempt alternative location or prompt for different permissions
rescue Errno::ENOSPC
# No space left on device
puts "Insufficient disk space for database"
# Clean up temporary files or alert administrators
rescue Errno::EROFS
# Read-only file system
puts "Cannot write to read-only file system"
# Switch to read-only mode or alternative storage
rescue DBMError => e
# DBM-specific errors (corruption, format issues)
puts "Database error: #{e.message}"
# Attempt database repair or restoration from backup
end
Database corruption represents a serious error condition that can occur due to improper shutdown, concurrent write access without proper locking, or file system issues. Detecting and handling corruption requires careful error checking and recovery strategies.
class DatabaseManager
def initialize(db_path)
@db_path = db_path
@db = nil
end
def open_with_recovery
attempts = 0
begin
@db = DBM.open(@db_path)
verify_database_integrity
rescue DBMError => e
attempts += 1
if attempts < 3
puts "Database corruption detected, attempting recovery..."
repair_database
retry
else
raise "Database unrecoverable after #{attempts} attempts: #{e.message}"
end
end
end
private
def verify_database_integrity
# Attempt to read a known key or iterate through database
@db.each_key.first(10) # Test reading first 10 keys
rescue => e
raise DBMError.new("Database integrity check failed: #{e.message}")
end
def repair_database
# Create backup before repair
backup_path = "#{@db_path}.backup.#{Time.now.to_i}"
FileUtils.cp_r(Dir.glob("#{@db_path}*"), backup_path)
# Attempt to rebuild database by reading recoverable data
temp_db = DBM.open("#{@db_path}.temp")
begin
@db.each do |key, value|
temp_db[key] = value
end
rescue => e
# Some data may be unrecoverable
puts "Warning: Some data may be lost during recovery: #{e.message}"
ensure
@db.close if @db
temp_db.close
# Replace corrupted database with repaired version
Dir.glob("#{@db_path}*").each { |file| File.delete(file) }
Dir.glob("#{@db_path}.temp*").each do |file|
new_name = file.sub('.temp', '')
File.rename(file, new_name)
end
end
end
end
Concurrent access issues arise when multiple processes attempt to write to the same DBM database simultaneously. While many DBM implementations provide some level of file locking, applications must implement additional synchronization for complex concurrent scenarios.
class ConcurrentDBM
def initialize(db_path)
@db_path = db_path
@lock_file = "#{db_path}.lock"
end
def with_exclusive_access(&block)
File.open(@lock_file, File::CREAT | File::EXCL) do |lock|
lock.flock(File::LOCK_EX)
db = DBM.open(@db_path)
begin
yield db
ensure
db.close
end
end
rescue Errno::EEXIST
# Lock file exists, wait and retry
sleep(0.1)
retry
ensure
File.delete(@lock_file) if File.exist?(@lock_file)
end
def safe_read(key, timeout = 5)
start_time = Time.now
loop do
begin
return with_exclusive_access { |db| db[key] }
rescue Errno::EEXIST
if Time.now - start_time > timeout
raise TimeoutError.new("Could not acquire database lock within #{timeout} seconds")
end
sleep(0.05)
end
end
end
end
Performance & Memory
DBM performance characteristics vary significantly between backend implementations and usage patterns. GDBM typically provides better performance for larger datasets due to more sophisticated indexing algorithms, while SDBM offers faster startup times for smaller databases.
Key size limitations affect performance and compatibility across different DBM implementations. SDBM imposes strict limits on key and value sizes, while GDBM supports much larger entries but with performance penalties for very large values.
require 'benchmark'
require 'dbm'
# Performance comparison between different operations
db = DBM.open('performance_test.db')
# Measure write performance
write_times = Benchmark.measure do
10_000.times do |i|
db["key_#{i}"] = "value_#{i}" * 10
end
end
puts "Write performance: #{write_times.real} seconds for 10,000 records"
# Measure sequential read performance
read_times = Benchmark.measure do
10_000.times do |i|
value = db["key_#{i}"]
end
end
puts "Sequential read: #{read_times.real} seconds"
# Measure random access performance
keys = db.keys.shuffle
random_times = Benchmark.measure do
keys.first(1000).each { |key| db[key] }
end
puts "Random access: #{random_times.real} seconds for 1,000 random reads"
# Measure iteration performance
iteration_times = Benchmark.measure do
count = 0
db.each { |k, v| count += 1 }
end
puts "Full iteration: #{iteration_times.real} seconds, #{db.size} records"
db.close
Memory usage patterns differ between DBM implementations, with some loading entire databases into memory while others use memory-mapped files or page-based caching. Understanding these patterns helps optimize application memory usage.
class MemoryEfficientDBM
def initialize(db_path)
@db_path = db_path
end
# Process large database without loading all data into memory
def process_in_batches(batch_size = 1000, &block)
DBM.open(@db_path) do |db|
keys = db.keys
keys.each_slice(batch_size) do |batch_keys|
batch_data = {}
batch_keys.each { |key| batch_data[key] = db[key] }
yield batch_data
# Explicitly clear batch data to help garbage collection
batch_data.clear
GC.start if batch_keys.size == batch_size
end
end
end
# Streaming processor for very large databases
def stream_process(&block)
DBM.open(@db_path) do |db|
db.each_pair do |key, value|
yield key, value
# Allow garbage collection of processed data
if @processed_count % 10_000 == 0
GC.start
end
@processed_count = (@processed_count || 0) + 1
end
end
end
end
# Usage example for memory-efficient processing
processor = MemoryEfficientDBM.new('large_dataset.db')
# Process in batches to control memory usage
processor.process_in_batches(500) do |batch|
# Transform data in manageable chunks
transformed = batch.transform_values { |v| expensive_transformation(v) }
save_results(transformed)
end
Database file size optimization requires understanding how different DBM implementations handle deleted records and database reorganization. Some implementations benefit from periodic reorganization to reclaim space from deleted records.
class DBMOptimizer
def initialize(db_path)
@db_path = db_path
end
def optimize_database
original_size = database_file_size
# Create optimized copy
optimized_path = "#{@db_path}.optimized"
DBM.open(@db_path) do |source|
DBM.open(optimized_path) do |target|
# Copy all data to new database (removes deleted record space)
source.each { |key, value| target[key] = value }
end
end
# Replace original with optimized version
backup_original
replace_with_optimized(optimized_path)
new_size = database_file_size
space_saved = original_size - new_size
puts "Database optimized: saved #{space_saved} bytes (#{(space_saved.to_f / original_size * 100).round(2)}%)"
end
private
def database_file_size
Dir.glob("#{@db_path}*").sum { |file| File.size(file) }
end
def backup_original
timestamp = Time.now.strftime('%Y%m%d_%H%M%S')
Dir.glob("#{@db_path}*").each do |file|
backup_name = "#{file}.backup_#{timestamp}"
FileUtils.cp(file, backup_name)
end
end
def replace_with_optimized(optimized_path)
# Remove original files
Dir.glob("#{@db_path}*").each { |file| File.delete(file) unless file.include?('.backup_') }
# Move optimized files to original location
Dir.glob("#{optimized_path}*").each do |file|
target_name = file.sub('.optimized', '')
File.rename(file, target_name)
end
end
end
Common Pitfalls
DBM key handling creates subtle bugs when applications assume hash-like behavior without considering string conversion and encoding issues. All keys are converted to strings, potentially causing unexpected behavior with numeric or symbol keys.
db = DBM.open('pitfall_demo.db')
# Pitfall: Numeric keys converted to strings
db[1] = 'first'
db[2] = 'second'
# These lookups will fail unexpectedly
puts db[1] # => nil (looking for integer key)
puts db["1"] # => "first" (string key exists)
# Symbol keys also converted to strings
db[:config] = 'development'
puts db[:config] # => nil
puts db['config'] # => "development"
# Proper approach: explicit string conversion
user_id = 12345
db["user:#{user_id}"] = user_data
retrieved = db["user:#{user_id}"]
Database file corruption occurs frequently when applications don't properly close databases or handle system interruptions. DBM databases require explicit closing to ensure data integrity, and improper shutdown can leave databases in inconsistent states.
# Dangerous: database might not close properly
def bad_database_usage
db = DBM.open('data.db')
db['key'] = 'value'
# No explicit close - relies on garbage collection
# If exception occurs, database remains open
end
# Better: ensure database closure
def safe_database_usage
db = DBM.open('data.db')
begin
db['key'] = 'value'
# Process data
ensure
db.close
end
end
# Best: use block form for automatic resource management
def recommended_database_usage
DBM.open('data.db') do |db|
db['key'] = 'value'
# Database automatically closed even if exception occurs
end
end
Encoding issues emerge when storing text data with different character encodings, as DBM stores raw bytes without encoding awareness. This creates problems when retrieving text that doesn't match the current locale encoding.
# Encoding pitfall demonstration
DBM.open('encoding_test.db') do |db|
# Store text with different encodings
db['utf8_text'] = 'Hello 世界'.encode('UTF-8')
db['latin1_text'] = 'Café'.encode('ISO-8859-1')
# Retrieval might fail or produce garbage
begin
utf8_retrieved = db['utf8_text']
puts utf8_retrieved.encoding # => UTF-8 (usually)
latin1_retrieved = db['latin1_text']
puts latin1_retrieved.encoding # => ASCII-8BIT (binary)
# Attempting to work with mixed encodings causes errors
combined = utf8_retrieved + latin1_retrieved # Encoding::CompatibilityError
rescue Encoding::CompatibilityError => e
puts "Encoding error: #{e.message}"
end
end
# Safe encoding approach
class EncodingSafeDBM
def initialize(db_path, default_encoding = 'UTF-8')
@db_path = db_path
@encoding = default_encoding
end
def store(key, value)
DBM.open(@db_path) do |db|
encoded_value = value.encode(@encoding)
db[key.to_s] = "#{@encoding}:#{encoded_value}"
end
end
def retrieve(key)
DBM.open(@db_path) do |db|
stored_value = db[key.to_s]
return nil unless stored_value
encoding, data = stored_value.split(':', 2)
data.force_encoding(encoding)
end
end
end
Concurrent access problems occur when multiple processes write to the same database simultaneously, leading to corruption or data loss. DBM implementations provide varying levels of built-in locking, but many scenarios require application-level synchronization.
# Dangerous: concurrent writes without synchronization
def unsafe_counter_increment(db_path, counter_name)
DBM.open(db_path) do |db|
current = db[counter_name].to_i
sleep(0.1) # Simulate processing time - race condition window
db[counter_name] = (current + 1).to_s
end
end
# Multiple processes calling this simultaneously will lose increments
# Safe: file-based locking for concurrent access
class SafeCounter
def initialize(db_path)
@db_path = db_path
@lock_path = "#{db_path}.lock"
end
def increment(counter_name)
File.open(@lock_path, 'a') do |lock_file|
lock_file.flock(File::LOCK_EX)
DBM.open(@db_path) do |db|
current = db[counter_name].to_i
new_value = current + 1
db[counter_name] = new_value.to_s
new_value
end
end
end
def get(counter_name)
DBM.open(@db_path) do |db|
db[counter_name].to_i
end
end
end
# Usage in concurrent environment
counter = SafeCounter.new('counters.db')
# Multiple processes can safely increment
threads = 10.times.map do
Thread.new { 100.times { counter.increment('page_views') } }
end
threads.each(&:join)
puts "Final count: #{counter.get('page_views')}" # => 1000
Reference
Core Classes and Modules
Class | Purpose | Backend |
---|---|---|
DBM |
Generic database interface | Auto-selected (GDBM preferred) |
GDBM |
GNU Database Manager | Single file, robust locking |
SDBM |
Simple Database Manager | Portable, size-limited |
Opening and Closing Methods
Method | Parameters | Returns | Description |
---|---|---|---|
DBM.open(filename, mode=0666, flags=nil) |
filename (String), mode (Integer), flags (Integer) | DBM instance |
Opens database file with specified permissions |
DBM.open(filename, mode, flags) { |db| } |
filename (String), mode (Integer), flags (Integer), block | Block result | Opens database and yields to block, auto-closes |
#close |
None | nil |
Closes database and flushes changes |
#closed? |
None | Boolean |
Returns true if database is closed |
Data Access Methods
Method | Parameters | Returns | Description |
---|---|---|---|
#[](key) |
key (String-convertible) | String or nil |
Retrieves value for key |
#[]=(key, value) |
key (String-convertible), value (String-convertible) | String |
Stores key-value pair |
#fetch(key, default=nil) |
key (String), default (Any) | String or default |
Retrieves value or returns default |
#store(key, value) |
key (String), value (String) | String |
Stores key-value pair (alias for []=) |
#delete(key) |
key (String) | String or nil |
Removes key-value pair |
#has_key?(key) |
key (String) | Boolean |
Tests for key existence |
#key?(key) |
key (String) | Boolean |
Alias for has_key? |
#include?(key) |
key (String) | Boolean |
Alias for has_key? |
#member?(key) |
key (String) | Boolean |
Alias for has_key? |
Enumeration Methods
Method | Parameters | Returns | Description |
---|---|---|---|
#each { |key, value| } |
block | self |
Iterates over key-value pairs |
#each_pair { |key, value| } |
block | self |
Alias for each |
#each_key { |key| } |
block | self |
Iterates over keys only |
#each_value { |value| } |
block | self |
Iterates over values only |
#keys |
None | Array<String> |
Returns array of all keys |
#values |
None | Array<String> |
Returns array of all values |
#to_a |
None | Array<Array<String>> |
Returns array of [key, value] pairs |
#to_hash |
None | Hash<String, String> |
Converts to Hash object |
Database Management Methods
Method | Parameters | Returns | Description |
---|---|---|---|
#clear |
None | self |
Removes all key-value pairs |
#empty? |
None | Boolean |
Tests if database contains no data |
#length |
None | Integer |
Returns number of key-value pairs |
#size |
None | Integer |
Alias for length |
#sync |
None | self |
Forces data synchronization to disk |
#reorganize |
None | self |
Reorganizes database (GDBM only) |
File Access Flags
Flag | Value | Description |
---|---|---|
File::RDONLY |
0 | Read-only access |
File::WRONLY |
1 | Write-only access |
File::RDWR |
2 | Read-write access |
File::CREAT |
64 | Create file if it doesn't exist |
File::EXCL |
128 | Fail if file exists (with CREAT) |
File::TRUNC |
512 | Truncate file to zero length |
Exception Hierarchy
Exception | Parent | Description |
---|---|---|
DBMError |
StandardError |
Base class for DBM-related errors |
Errno::EACCES |
SystemCallError |
Permission denied accessing database file |
Errno::ENOENT |
SystemCallError |
Database file not found |
Errno::ENOSPC |
SystemCallError |
No space left on device |
Errno::EROFS |
SystemCallError |
Read-only file system |
Implementation Differences
Feature | GDBM | SDBM | NDBM |
---|---|---|---|
File Structure | Single file | Multiple files | Multiple files (.dir, .pag) |
Key Size Limit | ~32KB | ~1KB | Implementation-dependent |
Value Size Limit | ~32KB | ~1KB | Implementation-dependent |
Locking | Built-in | Minimal | Implementation-dependent |
Reorganization | Supported | Not available | Not available |
Portability | Good | Excellent | Variable |
Performance Characteristics
Operation | Time Complexity | Notes |
---|---|---|
Key lookup | O(1) average | Hash-based indexing |
Insertion | O(1) average | May trigger reorganization |
Deletion | O(1) | May leave dead space |
Iteration | O(n) | Sequential file access |
Database size | Variable | Depends on deleted record handling |