CrackedRuby logo

CrackedRuby

DBM

Overview

DBM in Ruby provides a hash-like interface to persistent key-value databases, supporting multiple backend implementations including GDBM, NDBM, and SDBM. The DBM class acts as an abstract interface that automatically selects an available database implementation, typically preferring GDBM when available due to its superior feature set and reliability.

Ruby's DBM implementation maintains database files on disk while providing familiar hash operations for reading and writing data. Keys and values are stored as strings, and the database remains persistent between program executions. DBM databases support concurrent read access but require careful handling for write operations to prevent corruption.

The DBM interface includes three main classes: DBM (the generic interface), GDBM (GNU Database Manager), and SDBM (Simple Database Manager). Each backend has different characteristics regarding file locking, maximum key/value sizes, and concurrent access patterns.

require 'dbm'

# Open or create a database
db = DBM.open('mydata.db')

# Store and retrieve data
db['user:123'] = 'Alice'
db['user:124'] = 'Bob'

puts db['user:123']  # => "Alice"

# Check available keys
db.keys  # => ["user:123", "user:124"]

db.close

DBM databases automatically handle file management, creating necessary files when opening a non-existent database and managing internal structures for key indexing and data storage. The interface supports both string and symbol keys, converting symbols to strings internally.

# Using blocks for automatic resource management
DBM.open('temp.db') do |db|
  db[:config] = 'production'
  db['last_update'] = Time.now.to_s
  
  # Database automatically closed when block exits
end

Most DBM implementations create multiple files on disk, typically including a data file and an index file. GDBM creates a single file containing both data and metadata, while NDBM typically creates .dir and .pag files for directory and page information respectively.

Basic Usage

Opening a DBM database requires specifying a filename and optionally providing flags for access mode and file permissions. Ruby creates the database files if they don't exist, using default permissions that can be customized during creation.

require 'dbm'

# Open with default settings (read-write, create if needed)
db = DBM.open('application.db')

# Open with specific flags and permissions
db = DBM.open('secure.db', 0644, File::CREAT | File::RDWR)

# Read-only access to existing database
db = DBM.open('readonly.db', 0644, File::RDONLY)

DBM stores both keys and values as strings, automatically converting other data types through their to_s method. This string-only storage model requires careful consideration when storing structured data or maintaining data types across program executions.

db = DBM.open('example.db')

# Store different data types (all converted to strings)
db['count'] = 42
db['rate'] = 3.14159
db['active'] = true

# Retrieve and convert back to original types
count = db['count'].to_i      # => 42
rate = db['rate'].to_f        # => 3.14159
active = db['active'] == 'true'  # => true

# Store complex data using serialization
require 'json'
user_data = { name: 'Alice', age: 30, roles: ['admin', 'user'] }
db['user:profile'] = user_data.to_json

# Retrieve and deserialize
retrieved = JSON.parse(db['user:profile'])
# => {"name"=>"Alice", "age"=>30, "roles"=>["admin", "user"]}

Hash-like operations work intuitively with DBM objects, including iteration methods, key existence checks, and bulk operations. The database persists all changes immediately unless transactions are explicitly managed.

db = DBM.open('inventory.db')

# Hash-like assignment and retrieval
db['item:1001'] = 'Widget A'
db['item:1002'] = 'Widget B'
db['stock:1001'] = '25'

# Check for key existence
if db.has_key?('item:1001')
  puts "Item 1001: #{db['item:1001']}"
end

# Iterate over all key-value pairs
db.each do |key, value|
  puts "#{key}: #{value}"
end

# Get all keys matching a pattern
item_keys = db.keys.select { |key| key.start_with?('item:') }

# Bulk deletion
item_keys.each { |key| db.delete(key) }

# Clear entire database
db.clear

DBM supports several iteration patterns and provides methods for examining database contents without loading all data into memory simultaneously. This capability proves crucial when working with large databases that exceed available RAM.

# Iterate keys only (memory efficient for large databases)
db.each_key do |key|
  process_key(key) if key.match?(/^session:/)
end

# Iterate values only
db.each_value do |value|
  total += value.to_i if value.match?(/^\d+$/)
end

# Select subset of data based on criteria
recent_sessions = {}
db.each_pair do |key, value|
  if key.start_with?('session:') && recent?(value)
    recent_sessions[key] = value
  end
end

Error Handling & Debugging

DBM operations can fail due to file system issues, permission problems, database corruption, or concurrent access conflicts. Ruby raises specific exceptions for different error conditions, requiring comprehensive error handling for robust applications.

File-related errors occur most commonly when opening databases, particularly in production environments where file permissions, disk space, or file system mounting issues can prevent database access.

require 'dbm'

begin
  db = DBM.open('/var/lib/myapp/data.db', 0644, File::CREAT | File::RDWR)
rescue Errno::EACCES
  # Permission denied - check file/directory permissions
  puts "Cannot access database: permission denied"
  # Attempt alternative location or prompt for different permissions
rescue Errno::ENOSPC
  # No space left on device
  puts "Insufficient disk space for database"
  # Clean up temporary files or alert administrators
rescue Errno::EROFS
  # Read-only file system
  puts "Cannot write to read-only file system"
  # Switch to read-only mode or alternative storage
rescue DBMError => e
  # DBM-specific errors (corruption, format issues)
  puts "Database error: #{e.message}"
  # Attempt database repair or restoration from backup
end

Database corruption represents a serious error condition that can occur due to improper shutdown, concurrent write access without proper locking, or file system issues. Detecting and handling corruption requires careful error checking and recovery strategies.

class DatabaseManager
  def initialize(db_path)
    @db_path = db_path
    @db = nil
  end

  def open_with_recovery
    attempts = 0
    begin
      @db = DBM.open(@db_path)
      verify_database_integrity
    rescue DBMError => e
      attempts += 1
      if attempts < 3
        puts "Database corruption detected, attempting recovery..."
        repair_database
        retry
      else
        raise "Database unrecoverable after #{attempts} attempts: #{e.message}"
      end
    end
  end

  private

  def verify_database_integrity
    # Attempt to read a known key or iterate through database
    @db.each_key.first(10)  # Test reading first 10 keys
  rescue => e
    raise DBMError.new("Database integrity check failed: #{e.message}")
  end

  def repair_database
    # Create backup before repair
    backup_path = "#{@db_path}.backup.#{Time.now.to_i}"
    FileUtils.cp_r(Dir.glob("#{@db_path}*"), backup_path)
    
    # Attempt to rebuild database by reading recoverable data
    temp_db = DBM.open("#{@db_path}.temp")
    
    begin
      @db.each do |key, value|
        temp_db[key] = value
      end
    rescue => e
      # Some data may be unrecoverable
      puts "Warning: Some data may be lost during recovery: #{e.message}"
    ensure
      @db.close if @db
      temp_db.close
      
      # Replace corrupted database with repaired version
      Dir.glob("#{@db_path}*").each { |file| File.delete(file) }
      Dir.glob("#{@db_path}.temp*").each do |file|
        new_name = file.sub('.temp', '')
        File.rename(file, new_name)
      end
    end
  end
end

Concurrent access issues arise when multiple processes attempt to write to the same DBM database simultaneously. While many DBM implementations provide some level of file locking, applications must implement additional synchronization for complex concurrent scenarios.

class ConcurrentDBM
  def initialize(db_path)
    @db_path = db_path
    @lock_file = "#{db_path}.lock"
  end

  def with_exclusive_access(&block)
    File.open(@lock_file, File::CREAT | File::EXCL) do |lock|
      lock.flock(File::LOCK_EX)
      
      db = DBM.open(@db_path)
      begin
        yield db
      ensure
        db.close
      end
    end
  rescue Errno::EEXIST
    # Lock file exists, wait and retry
    sleep(0.1)
    retry
  ensure
    File.delete(@lock_file) if File.exist?(@lock_file)
  end

  def safe_read(key, timeout = 5)
    start_time = Time.now
    
    loop do
      begin
        return with_exclusive_access { |db| db[key] }
      rescue Errno::EEXIST
        if Time.now - start_time > timeout
          raise TimeoutError.new("Could not acquire database lock within #{timeout} seconds")
        end
        sleep(0.05)
      end
    end
  end
end

Performance & Memory

DBM performance characteristics vary significantly between backend implementations and usage patterns. GDBM typically provides better performance for larger datasets due to more sophisticated indexing algorithms, while SDBM offers faster startup times for smaller databases.

Key size limitations affect performance and compatibility across different DBM implementations. SDBM imposes strict limits on key and value sizes, while GDBM supports much larger entries but with performance penalties for very large values.

require 'benchmark'
require 'dbm'

# Performance comparison between different operations
db = DBM.open('performance_test.db')

# Measure write performance
write_times = Benchmark.measure do
  10_000.times do |i|
    db["key_#{i}"] = "value_#{i}" * 10
  end
end

puts "Write performance: #{write_times.real} seconds for 10,000 records"

# Measure sequential read performance
read_times = Benchmark.measure do
  10_000.times do |i|
    value = db["key_#{i}"]
  end
end

puts "Sequential read: #{read_times.real} seconds"

# Measure random access performance
keys = db.keys.shuffle
random_times = Benchmark.measure do
  keys.first(1000).each { |key| db[key] }
end

puts "Random access: #{random_times.real} seconds for 1,000 random reads"

# Measure iteration performance
iteration_times = Benchmark.measure do
  count = 0
  db.each { |k, v| count += 1 }
end

puts "Full iteration: #{iteration_times.real} seconds, #{db.size} records"

db.close

Memory usage patterns differ between DBM implementations, with some loading entire databases into memory while others use memory-mapped files or page-based caching. Understanding these patterns helps optimize application memory usage.

class MemoryEfficientDBM
  def initialize(db_path)
    @db_path = db_path
  end

  # Process large database without loading all data into memory
  def process_in_batches(batch_size = 1000, &block)
    DBM.open(@db_path) do |db|
      keys = db.keys
      
      keys.each_slice(batch_size) do |batch_keys|
        batch_data = {}
        batch_keys.each { |key| batch_data[key] = db[key] }
        
        yield batch_data
        
        # Explicitly clear batch data to help garbage collection
        batch_data.clear
        GC.start if batch_keys.size == batch_size
      end
    end
  end

  # Streaming processor for very large databases
  def stream_process(&block)
    DBM.open(@db_path) do |db|
      db.each_pair do |key, value|
        yield key, value
        
        # Allow garbage collection of processed data
        if @processed_count % 10_000 == 0
          GC.start
        end
        
        @processed_count = (@processed_count || 0) + 1
      end
    end
  end
end

# Usage example for memory-efficient processing
processor = MemoryEfficientDBM.new('large_dataset.db')

# Process in batches to control memory usage
processor.process_in_batches(500) do |batch|
  # Transform data in manageable chunks
  transformed = batch.transform_values { |v| expensive_transformation(v) }
  save_results(transformed)
end

Database file size optimization requires understanding how different DBM implementations handle deleted records and database reorganization. Some implementations benefit from periodic reorganization to reclaim space from deleted records.

class DBMOptimizer
  def initialize(db_path)
    @db_path = db_path
  end

  def optimize_database
    original_size = database_file_size
    
    # Create optimized copy
    optimized_path = "#{@db_path}.optimized"
    
    DBM.open(@db_path) do |source|
      DBM.open(optimized_path) do |target|
        # Copy all data to new database (removes deleted record space)
        source.each { |key, value| target[key] = value }
      end
    end
    
    # Replace original with optimized version
    backup_original
    replace_with_optimized(optimized_path)
    
    new_size = database_file_size
    space_saved = original_size - new_size
    
    puts "Database optimized: saved #{space_saved} bytes (#{(space_saved.to_f / original_size * 100).round(2)}%)"
  end

  private

  def database_file_size
    Dir.glob("#{@db_path}*").sum { |file| File.size(file) }
  end

  def backup_original
    timestamp = Time.now.strftime('%Y%m%d_%H%M%S')
    Dir.glob("#{@db_path}*").each do |file|
      backup_name = "#{file}.backup_#{timestamp}"
      FileUtils.cp(file, backup_name)
    end
  end

  def replace_with_optimized(optimized_path)
    # Remove original files
    Dir.glob("#{@db_path}*").each { |file| File.delete(file) unless file.include?('.backup_') }
    
    # Move optimized files to original location
    Dir.glob("#{optimized_path}*").each do |file|
      target_name = file.sub('.optimized', '')
      File.rename(file, target_name)
    end
  end
end

Common Pitfalls

DBM key handling creates subtle bugs when applications assume hash-like behavior without considering string conversion and encoding issues. All keys are converted to strings, potentially causing unexpected behavior with numeric or symbol keys.

db = DBM.open('pitfall_demo.db')

# Pitfall: Numeric keys converted to strings
db[1] = 'first'
db[2] = 'second'

# These lookups will fail unexpectedly
puts db[1]    # => nil (looking for integer key)
puts db["1"]  # => "first" (string key exists)

# Symbol keys also converted to strings
db[:config] = 'development'
puts db[:config]    # => nil
puts db['config']   # => "development"

# Proper approach: explicit string conversion
user_id = 12345
db["user:#{user_id}"] = user_data
retrieved = db["user:#{user_id}"]

Database file corruption occurs frequently when applications don't properly close databases or handle system interruptions. DBM databases require explicit closing to ensure data integrity, and improper shutdown can leave databases in inconsistent states.

# Dangerous: database might not close properly
def bad_database_usage
  db = DBM.open('data.db')
  db['key'] = 'value'
  # No explicit close - relies on garbage collection
  # If exception occurs, database remains open
end

# Better: ensure database closure
def safe_database_usage
  db = DBM.open('data.db')
  begin
    db['key'] = 'value'
    # Process data
  ensure
    db.close
  end
end

# Best: use block form for automatic resource management
def recommended_database_usage
  DBM.open('data.db') do |db|
    db['key'] = 'value'
    # Database automatically closed even if exception occurs
  end
end

Encoding issues emerge when storing text data with different character encodings, as DBM stores raw bytes without encoding awareness. This creates problems when retrieving text that doesn't match the current locale encoding.

# Encoding pitfall demonstration
DBM.open('encoding_test.db') do |db|
  # Store text with different encodings
  db['utf8_text'] = 'Hello 世界'.encode('UTF-8')
  db['latin1_text'] = 'Café'.encode('ISO-8859-1')
  
  # Retrieval might fail or produce garbage
  begin
    utf8_retrieved = db['utf8_text']
    puts utf8_retrieved.encoding  # => UTF-8 (usually)
    
    latin1_retrieved = db['latin1_text']
    puts latin1_retrieved.encoding  # => ASCII-8BIT (binary)
    
    # Attempting to work with mixed encodings causes errors
    combined = utf8_retrieved + latin1_retrieved  # Encoding::CompatibilityError
  rescue Encoding::CompatibilityError => e
    puts "Encoding error: #{e.message}"
  end
end

# Safe encoding approach
class EncodingSafeDBM
  def initialize(db_path, default_encoding = 'UTF-8')
    @db_path = db_path
    @encoding = default_encoding
  end

  def store(key, value)
    DBM.open(@db_path) do |db|
      encoded_value = value.encode(@encoding)
      db[key.to_s] = "#{@encoding}:#{encoded_value}"
    end
  end

  def retrieve(key)
    DBM.open(@db_path) do |db|
      stored_value = db[key.to_s]
      return nil unless stored_value
      
      encoding, data = stored_value.split(':', 2)
      data.force_encoding(encoding)
    end
  end
end

Concurrent access problems occur when multiple processes write to the same database simultaneously, leading to corruption or data loss. DBM implementations provide varying levels of built-in locking, but many scenarios require application-level synchronization.

# Dangerous: concurrent writes without synchronization
def unsafe_counter_increment(db_path, counter_name)
  DBM.open(db_path) do |db|
    current = db[counter_name].to_i
    sleep(0.1)  # Simulate processing time - race condition window
    db[counter_name] = (current + 1).to_s
  end
end

# Multiple processes calling this simultaneously will lose increments

# Safe: file-based locking for concurrent access
class SafeCounter
  def initialize(db_path)
    @db_path = db_path
    @lock_path = "#{db_path}.lock"
  end

  def increment(counter_name)
    File.open(@lock_path, 'a') do |lock_file|
      lock_file.flock(File::LOCK_EX)
      
      DBM.open(@db_path) do |db|
        current = db[counter_name].to_i
        new_value = current + 1
        db[counter_name] = new_value.to_s
        new_value
      end
    end
  end

  def get(counter_name)
    DBM.open(@db_path) do |db|
      db[counter_name].to_i
    end
  end
end

# Usage in concurrent environment
counter = SafeCounter.new('counters.db')

# Multiple processes can safely increment
threads = 10.times.map do
  Thread.new { 100.times { counter.increment('page_views') } }
end

threads.each(&:join)
puts "Final count: #{counter.get('page_views')}"  # => 1000

Reference

Core Classes and Modules

Class Purpose Backend
DBM Generic database interface Auto-selected (GDBM preferred)
GDBM GNU Database Manager Single file, robust locking
SDBM Simple Database Manager Portable, size-limited

Opening and Closing Methods

Method Parameters Returns Description
DBM.open(filename, mode=0666, flags=nil) filename (String), mode (Integer), flags (Integer) DBM instance Opens database file with specified permissions
DBM.open(filename, mode, flags) { |db| } filename (String), mode (Integer), flags (Integer), block Block result Opens database and yields to block, auto-closes
#close None nil Closes database and flushes changes
#closed? None Boolean Returns true if database is closed

Data Access Methods

Method Parameters Returns Description
#[](key) key (String-convertible) String or nil Retrieves value for key
#[]=(key, value) key (String-convertible), value (String-convertible) String Stores key-value pair
#fetch(key, default=nil) key (String), default (Any) String or default Retrieves value or returns default
#store(key, value) key (String), value (String) String Stores key-value pair (alias for []=)
#delete(key) key (String) String or nil Removes key-value pair
#has_key?(key) key (String) Boolean Tests for key existence
#key?(key) key (String) Boolean Alias for has_key?
#include?(key) key (String) Boolean Alias for has_key?
#member?(key) key (String) Boolean Alias for has_key?

Enumeration Methods

Method Parameters Returns Description
#each { |key, value| } block self Iterates over key-value pairs
#each_pair { |key, value| } block self Alias for each
#each_key { |key| } block self Iterates over keys only
#each_value { |value| } block self Iterates over values only
#keys None Array<String> Returns array of all keys
#values None Array<String> Returns array of all values
#to_a None Array<Array<String>> Returns array of [key, value] pairs
#to_hash None Hash<String, String> Converts to Hash object

Database Management Methods

Method Parameters Returns Description
#clear None self Removes all key-value pairs
#empty? None Boolean Tests if database contains no data
#length None Integer Returns number of key-value pairs
#size None Integer Alias for length
#sync None self Forces data synchronization to disk
#reorganize None self Reorganizes database (GDBM only)

File Access Flags

Flag Value Description
File::RDONLY 0 Read-only access
File::WRONLY 1 Write-only access
File::RDWR 2 Read-write access
File::CREAT 64 Create file if it doesn't exist
File::EXCL 128 Fail if file exists (with CREAT)
File::TRUNC 512 Truncate file to zero length

Exception Hierarchy

Exception Parent Description
DBMError StandardError Base class for DBM-related errors
Errno::EACCES SystemCallError Permission denied accessing database file
Errno::ENOENT SystemCallError Database file not found
Errno::ENOSPC SystemCallError No space left on device
Errno::EROFS SystemCallError Read-only file system

Implementation Differences

Feature GDBM SDBM NDBM
File Structure Single file Multiple files Multiple files (.dir, .pag)
Key Size Limit ~32KB ~1KB Implementation-dependent
Value Size Limit ~32KB ~1KB Implementation-dependent
Locking Built-in Minimal Implementation-dependent
Reorganization Supported Not available Not available
Portability Good Excellent Variable

Performance Characteristics

Operation Time Complexity Notes
Key lookup O(1) average Hash-based indexing
Insertion O(1) average May trigger reorganization
Deletion O(1) May leave dead space
Iteration O(n) Sequential file access
Database size Variable Depends on deleted record handling