CrackedRuby logo

CrackedRuby

store_if_absent

Overview

Ruby provides several approaches for storing values only when keys are absent, ranging from built-in Hash methods to thread-safe concurrent data structures. The "store if absent" pattern ensures that values are only written when keys don't exist, preventing unnecessary overwrites and maintaining data integrity in both single-threaded and concurrent environments.

Standard Ruby Hash objects offer basic conditional storage through methods like fetch with default values, the ||= operator, and explicit key existence checks. These work well for single-threaded applications but lack atomicity guarantees needed for concurrent code.

The concurrent-ruby gem extends this functionality with thread-safe methods like put_if_absent and compute_if_absent on Concurrent::Map objects. These methods provide atomic operations that prevent race conditions when multiple threads attempt to store values simultaneously.

# Standard Hash approach
hash = {}
hash[:key] ||= "default value"

# Concurrent approach
map = Concurrent::Map.new
map.put_if_absent(:key, "default value")

# Custom implementation
def store_if_absent(hash, key, value)
  hash[key] = value unless hash.key?(key)
end

Basic Usage

Ruby's standard Hash class provides several ways to implement conditional storage. The most common approach uses the ||= operator, which assigns a value only if the current value is falsy. This works well for initializing Hash values but has limitations when dealing with explicit false or nil values.

user_preferences = {}
user_preferences[:theme] ||= "light"
user_preferences[:notifications] ||= true

puts user_preferences[:theme] # => "light"
puts user_preferences[:notifications] # => true

# Subsequent assignments won't overwrite
user_preferences[:theme] ||= "dark"
puts user_preferences[:theme] # => "light"

The fetch method provides more precise control over default value assignment. Unlike ||=, it only assigns when the key is completely absent, not when the value is falsy.

settings = { auto_save: false }

# Using ||= incorrectly overwrites false
settings[:auto_save] ||= true
puts settings[:auto_save] # => true (incorrect!)

# Using fetch preserves false values
settings = { auto_save: false }
settings[:auto_save] = settings.fetch(:auto_save, true)
puts settings[:auto_save] # => false (correct!)

For explicit key existence checking, Ruby provides the key? method and Hash bracket notation. This approach gives complete control over the storage logic.

cache = {}

def store_if_absent(hash, key, value)
  hash[key] = value unless hash.key?(key)
  hash[key]
end

result = store_if_absent(cache, :user_data, { name: "Alice", role: "admin" })
puts cache[:user_data] # => {:name=>"Alice", :role=>"admin"}

# Won't overwrite existing data
store_if_absent(cache, :user_data, { name: "Bob", role: "user" })
puts cache[:user_data] # => {:name=>"Alice", :role=>"admin"}

The concurrent-ruby gem provides thread-safe alternatives through Concurrent::Map. The put_if_absent method atomically inserts values only when keys are missing.

require 'concurrent-ruby'

shared_cache = Concurrent::Map.new

# Thread-safe conditional storage
result = shared_cache.put_if_absent(:session_id, SecureRandom.uuid)
puts result # => nil (key was absent, value stored)

# Subsequent calls return existing value
existing = shared_cache.put_if_absent(:session_id, SecureRandom.uuid)
puts existing # => original UUID (key exists, new value ignored)

Thread Safety & Concurrency

Thread safety becomes critical when multiple threads attempt to store values simultaneously. Standard Hash operations are not atomic, leading to race conditions where threads can interfere with each other's operations.

# Unsafe concurrent access
standard_hash = {}
threads = []

10.times do |i|
  threads << Thread.new do
    # Race condition: multiple threads may see key as absent
    unless standard_hash.key?(:counter)
      standard_hash[:counter] = 0
    end
    standard_hash[:counter] += 1
  end
end

threads.each(&:join)
puts standard_hash[:counter] # => Unpredictable result, likely less than 10

The concurrent-ruby gem solves this with atomic operations. Concurrent::Map#put_if_absent ensures that only one thread can successfully store a value for a new key.

require 'concurrent-ruby'

concurrent_map = Concurrent::Map.new
results = []
mutex = Mutex.new

10.times do |i|
  threads << Thread.new do
    result = concurrent_map.put_if_absent(:counter, 0)
    mutex.synchronize { results << result }
    
    # Safe increment using atomic operations
    concurrent_map.compute(:counter) { |old_value| old_value + 1 }
  end
end

threads.each(&:join)
puts concurrent_map[:counter] # => 10 (predictable result)
puts results.count(nil) # => 1 (only one thread stored initial value)

Concurrent::Map#compute_if_absent provides more flexibility by accepting a block for value computation. This is useful when the stored value requires expensive calculation that should only happen if storage is necessary.

expensive_cache = Concurrent::Map.new

def expensive_computation(key)
  sleep(1) # Simulate expensive operation
  "computed_value_for_#{key}"
end

# Multiple threads requesting same key
threads = 5.times.map do |i|
  Thread.new do
    result = expensive_cache.compute_if_absent(:shared_key) do
      expensive_computation(:shared_key)
    end
    puts "Thread #{i}: #{result}"
  end
end

threads.each(&:join)
# Only one thread performs expensive computation
# All threads receive same result

For scenarios requiring custom synchronization, Mutex objects provide explicit thread safety around standard Hash operations.

class ThreadSafeCache
  def initialize
    @cache = {}
    @mutex = Mutex.new
  end

  def store_if_absent(key, value)
    @mutex.synchronize do
      return @cache[key] if @cache.key?(key)
      @cache[key] = value
    end
  end

  def get(key)
    @mutex.synchronize { @cache[key] }
  end
end

cache = ThreadSafeCache.new
threads = 10.times.map do |i|
  Thread.new do
    cache.store_if_absent(:shared_data, "initial_value")
  end
end

threads.each(&:join)
puts cache.get(:shared_data) # => "initial_value"

Advanced Usage

Advanced conditional storage patterns combine multiple techniques for complex scenarios. Nested conditional storage handles hierarchical data structures where intermediate keys may need creation.

def deep_store_if_absent(hash, keys, value)
  keys[0..-2].inject(hash) do |h, key|
    h[key] ||= {}
  end.tap do |target|
    final_key = keys.last
    target[final_key] = value unless target.key?(final_key)
  end
end

config = {}
deep_store_if_absent(config, [:database, :primary, :host], "localhost")
deep_store_if_absent(config, [:database, :primary, :port], 5432)
deep_store_if_absent(config, [:database, :replica, :host], "replica.db")

puts config
# => {:database=>{:primary=>{:host=>"localhost", :port=>5432}, 
#                :replica=>{:host=>"replica.db"}}}

Concurrent nested storage requires careful synchronization to prevent partial updates during concurrent access.

class ConcurrentNestedMap
  def initialize
    @maps = Concurrent::Map.new
  end

  def store_if_absent(namespace, key, value)
    namespace_map = @maps.compute_if_absent(namespace) do
      Concurrent::Map.new
    end
    namespace_map.put_if_absent(key, value)
  end

  def get(namespace, key)
    namespace_map = @maps[namespace]
    namespace_map&.get(key)
  end
end

nested_cache = ConcurrentNestedMap.new

# Concurrent access to different namespaces
threads = []
[:users, :sessions, :preferences].each do |namespace|
  10.times do |i|
    threads << Thread.new do
      nested_cache.store_if_absent(namespace, "key_#{i}", "value_#{i}")
    end
  end
end

threads.each(&:join)

Time-based conditional storage implements expiring cache entries where values should only be stored if they haven't expired.

class ExpiringCache
  def initialize(default_ttl: 3600)
    @cache = Concurrent::Map.new
    @default_ttl = default_ttl
  end

  def store_if_absent(key, value, ttl: nil)
    ttl ||= @default_ttl
    expires_at = Time.now + ttl
    
    @cache.compute_if_absent(key) do
      { value: value, expires_at: expires_at }
    end.tap do |entry|
      if entry[:expires_at] < Time.now
        @cache.compute(key) do
          { value: value, expires_at: expires_at }
        end
      end
    end
    
    entry[:value]
  end

  def get(key)
    entry = @cache[key]
    return nil unless entry
    return nil if entry[:expires_at] < Time.now
    entry[:value]
  end

  def cleanup_expired
    @cache.each_pair do |key, entry|
      @cache.delete(key) if entry[:expires_at] < Time.now
    end
  end
end

Conditional storage with validation ensures that only valid values are stored, preventing corruption of data structures.

class ValidatingStore
  def initialize
    @store = Concurrent::Map.new
    @validators = {}
  end

  def add_validator(key_pattern, &validator)
    @validators[key_pattern] = validator
  end

  def store_if_absent(key, value)
    validator = @validators.find { |pattern, _| key.match?(pattern) }&.last
    
    if validator && !validator.call(value)
      raise ArgumentError, "Value #{value} failed validation for key #{key}"
    end

    @store.put_if_absent(key, value)
  end
end

store = ValidatingStore.new
store.add_validator(/email/) { |value| value.include?("@") }
store.add_validator(/age/) { |value| value.is_a?(Integer) && value >= 0 }

store.store_if_absent(:user_email, "user@example.com") # Valid
store.store_if_absent(:user_age, 25) # Valid

begin
  store.store_if_absent(:admin_email, "invalid-email") # Raises error
rescue ArgumentError => e
  puts e.message
end

Performance & Memory

Performance characteristics vary significantly between different conditional storage approaches. Standard Hash operations with ||= provide the fastest access for single-threaded scenarios but offer no thread safety guarantees.

require 'benchmark'

# Performance comparison
hash = {}
concurrent_map = Concurrent::Map.new

Benchmark.bm(20) do |x|
  x.report("Hash ||=") do
    100_000.times { |i| hash[i] ||= "value_#{i}" }
  end
  
  x.report("Concurrent::Map") do
    100_000.times { |i| concurrent_map.put_if_absent(i, "value_#{i}") }
  end
end

# Results show Hash ||= is ~3x faster for single-threaded access
# Concurrent::Map provides safety at performance cost

Memory usage patterns differ based on internal implementation. Concurrent::Map uses more memory per entry due to thread safety structures, but provides better performance under high concurrency.

def measure_memory
  GC.start
  memory_before = GC.stat[:total_allocated_objects]
  yield
  GC.start
  memory_after = GC.stat[:total_allocated_objects]
  memory_after - memory_before
end

hash_memory = measure_memory do
  hash = {}
  10_000.times { |i| hash[i] ||= "value_#{i}" }
end

concurrent_map_memory = measure_memory do
  map = Concurrent::Map.new
  10_000.times { |i| map.put_if_absent(i, "value_#{i}") }
end

puts "Hash memory: #{hash_memory} objects"
puts "Concurrent::Map memory: #{concurrent_map_memory} objects"
# Concurrent::Map typically uses 20-30% more memory

Bulk operations can be optimized by minimizing synchronization overhead. Instead of individual put_if_absent calls, batch operations reduce lock contention.

class BatchingCache
  def initialize
    @cache = Concurrent::Map.new
    @batch_mutex = Mutex.new
  end

  def batch_store_if_absent(pairs)
    results = {}
    @batch_mutex.synchronize do
      pairs.each do |key, value|
        existing = @cache.put_if_absent(key, value)
        results[key] = existing || value
      end
    end
    results
  end
end

cache = BatchingCache.new
data_to_store = (1..1000).map { |i| [i, "value_#{i}"] }

# Batch operation is more efficient than individual calls
results = cache.batch_store_if_absent(data_to_store)

Large-scale concurrent access patterns reveal performance differences between approaches. Concurrent::Map scales better under heavy concurrent load despite higher per-operation overhead.

def concurrent_benchmark(storage, operation_count, thread_count)
  threads = thread_count.times.map do |t|
    Thread.new do
      start_time = Time.now
      (operation_count / thread_count).times do |i|
        key = (t * 1000) + i
        case storage
        when Hash
          storage[key] ||= "value_#{key}"
        when Concurrent::Map
          storage.put_if_absent(key, "value_#{key}")
        end
      end
      Time.now - start_time
    end
  end
  
  times = threads.map(&:value)
  times.sum / times.length
end

hash_with_mutex = {}
mutex = Mutex.new
concurrent_map = Concurrent::Map.new

# Test with increasing thread counts
[1, 2, 4, 8].each do |thread_count|
  hash_time = concurrent_benchmark(hash_with_mutex, 10_000, thread_count)
  map_time = concurrent_benchmark(concurrent_map, 10_000, thread_count)
  
  puts "#{thread_count} threads - Hash: #{hash_time}s, Map: #{map_time}s"
end

Common Pitfalls

Race conditions represent the most common pitfall when implementing conditional storage in concurrent environments. The check-then-act pattern creates windows where multiple threads can simultaneously observe absent keys.

# INCORRECT: Race condition
cache = {}
mutex = Mutex.new

threads = 10.times.map do |i|
  Thread.new do
    # Thread A and B both see key as absent
    unless cache.key?(:shared_data)
      sleep(0.01) # Simulate processing time
      # Both threads may reach this line
      cache[:shared_data] = "thread_#{i}_data"
    end
  end
end

threads.each(&:join)
puts cache[:shared_data] # => Unpredictable result

# CORRECT: Atomic operation
concurrent_map = Concurrent::Map.new

threads = 10.times.map do |i|
  Thread.new do
    # Only one thread succeeds in storing
    result = concurrent_map.put_if_absent(:shared_data, "thread_#{i}_data")
    puts "Thread #{i}: #{result.nil? ? 'stored' : 'found existing'}"
  end
end

threads.each(&:join)

The ||= operator can produce unexpected behavior when dealing with falsy values. It assigns new values not just when keys are absent, but also when existing values are false or nil.

user_settings = { 
  notifications_enabled: false,
  dark_mode: nil,
  auto_save: true 
}

# INCORRECT: Overwrites false and nil values
user_settings[:notifications_enabled] ||= true
user_settings[:dark_mode] ||= false

puts user_settings[:notifications_enabled] # => true (should be false!)
puts user_settings[:dark_mode] # => false (should be nil!)

# CORRECT: Check key existence explicitly
def safe_default(hash, key, default)
  hash.key?(key) ? hash[key] : hash[key] = default
end

user_settings = { 
  notifications_enabled: false,
  dark_mode: nil 
}

safe_default(user_settings, :notifications_enabled, true)
safe_default(user_settings, :dark_mode, false)

puts user_settings[:notifications_enabled] # => false (preserved!)
puts user_settings[:dark_mode] # => nil (preserved!)

Deadlock situations can occur when using multiple synchronized data structures or when callback blocks attempt to access the same concurrent structure.

# INCORRECT: Potential deadlock
map = Concurrent::Map.new

# Callback tries to access the same map
map.compute_if_absent(:key1) do
  # This can cause deadlock in some implementations
  map.put_if_absent(:key2, "value2")
  "value1"
end

# CORRECT: Avoid nested access to same structure
map = Concurrent::Map.new

# Compute values independently
value1 = compute_value_for_key1()
value2 = compute_value_for_key2()

# Store without nested calls
map.put_if_absent(:key1, value1)
map.put_if_absent(:key2, value2)

Memory leaks can result from improper cleanup in long-running applications that continuously store values. Without periodic cleanup, caches grow indefinitely.

# PROBLEMATIC: Unbounded growth
class LeakyCache
  def initialize
    @cache = Concurrent::Map.new
  end

  def get_or_compute(key, &computation)
    @cache.compute_if_absent(key, &computation)
  end
end

# Each request creates new cache entries
cache = LeakyCache.new
loop do
  user_id = SecureRandom.uuid # Always unique
  cache.get_or_compute(user_id) { expensive_user_data(user_id) }
end

# BETTER: Implement size limits and cleanup
class BoundedCache
  def initialize(max_size: 1000)
    @cache = Concurrent::Map.new
    @max_size = max_size
    @access_times = Concurrent::Map.new
  end

  def get_or_compute(key, &computation)
    cleanup_if_needed
    @access_times[key] = Time.now
    @cache.compute_if_absent(key, &computation)
  end

  private

  def cleanup_if_needed
    return unless @cache.size > @max_size
    
    # Remove oldest accessed entries
    sorted_keys = @access_times.to_h.sort_by { |_, time| time }.map(&:first)
    keys_to_remove = sorted_keys.first(@cache.size - @max_size + 100)
    
    keys_to_remove.each do |key|
      @cache.delete(key)
      @access_times.delete(key)
    end
  end
end

Exception safety requires careful consideration when computation blocks can raise exceptions. Failed computations should not leave data structures in inconsistent states.

# PROBLEMATIC: Exception leaves inconsistent state
cache = {}

def risky_store_if_absent(cache, key, &computation)
  return cache[key] if cache.key?(key)
  
  cache[key] = :computing # Marker value
  value = computation.call
  cache[key] = value
rescue => e
  cache.delete(key) # Clean up marker
  raise e
end

# BETTER: Use concurrent structures with proper exception handling
class SafeComputeCache
  def initialize
    @cache = Concurrent::Map.new
    @computing = Concurrent::Set.new
  end

  def get_or_compute(key, &computation)
    return @cache[key] if @cache.key?(key)
    
    # Prevent multiple threads from computing same key
    return wait_for_computation(key) unless @computing.add?(key)
    
    begin
      value = @cache.compute_if_absent(key) do
        computation.call
      end
      value
    ensure
      @computing.delete(key)
    end
  end

  private

  def wait_for_computation(key)
    # Wait for other thread to complete computation
    sleep(0.01) while @computing.include?(key)
    @cache[key]
  end
end

Reference

Standard Hash Methods

Method Parameters Returns Description
#[]=(key, value) key (Object), value (Object) Object Unconditionally stores value for key
#fetch(key, default=nil) key (Object), default (Object) Object Returns value for key or default if absent
#key?(key) key (Object) Boolean Tests if key exists in hash
#has_key?(key) key (Object) Boolean Alias for key?
#store(key, value) key (Object), value (Object) Object Alias for []=

Concurrent::Map Methods

Method Parameters Returns Description
#put_if_absent(key, value) key (Object), value (Object) Object or nil Atomically stores value if key absent
#compute_if_absent(key, &block) key (Object), block Object Computes and stores if key absent
#compute_if_present(key, &block) key (Object), block Object or nil Computes new value if key present
#compute(key, &block) key (Object), block Object Computes new value unconditionally
#get_and_set(key, value) key (Object), value (Object) Object or nil Atomically gets old value and sets new
#replace_if_exists(key, value) key (Object), value (Object) Object or nil Replaces value only if key exists

Common Patterns

Pattern Usage Thread Safe Performance
hash[key] ||= value Simple default assignment No High
hash.fetch(key, default) Safe default retrieval No High
hash[key] = value unless hash.key?(key) Explicit conditional store No Medium
map.put_if_absent(key, value) Thread-safe conditional store Yes Medium
map.compute_if_absent(key) { value } Thread-safe lazy computation Yes Medium
mutex.synchronize { hash[key] ||= value } Manual synchronization Yes Low

Error Conditions

Condition Standard Hash Concurrent::Map Mitigation
Key collision under concurrency Race condition Atomic operation Use concurrent structures
Falsy value overwrite with ||= Overwrites false/nil N/A Use explicit key checks
Exception during computation Inconsistent state Atomic rollback Implement proper cleanup
Memory exhaustion Unlimited growth Unlimited growth Implement size limits
Deadlock in callbacks N/A Possible with self-access Avoid nested structure access

Performance Characteristics

Operation Time Complexity Space Complexity Concurrency
Hash ||= O(1) O(1) None
Hash with Mutex O(1) + sync overhead O(1) Full serialization
Concurrent::Map put_if_absent O(1) amortized O(1) + safety overhead Lock-free/minimized
Concurrent::Map compute_if_absent O(1) + block execution O(1) + safety overhead Lock-free/minimized
Batch operations O(n) O(n) Reduced contention