Overview
Ruby provides several approaches for storing values only when keys are absent, ranging from built-in Hash methods to thread-safe concurrent data structures. The "store if absent" pattern ensures that values are only written when keys don't exist, preventing unnecessary overwrites and maintaining data integrity in both single-threaded and concurrent environments.
Standard Ruby Hash objects offer basic conditional storage through methods like fetch
with default values, the ||=
operator, and explicit key existence checks. These work well for single-threaded applications but lack atomicity guarantees needed for concurrent code.
The concurrent-ruby gem extends this functionality with thread-safe methods like put_if_absent
and compute_if_absent
on Concurrent::Map
objects. These methods provide atomic operations that prevent race conditions when multiple threads attempt to store values simultaneously.
# Standard Hash approach
hash = {}
hash[:key] ||= "default value"
# Concurrent approach
map = Concurrent::Map.new
map.put_if_absent(:key, "default value")
# Custom implementation
def store_if_absent(hash, key, value)
hash[key] = value unless hash.key?(key)
end
Basic Usage
Ruby's standard Hash class provides several ways to implement conditional storage. The most common approach uses the ||=
operator, which assigns a value only if the current value is falsy. This works well for initializing Hash values but has limitations when dealing with explicit false
or nil
values.
user_preferences = {}
user_preferences[:theme] ||= "light"
user_preferences[:notifications] ||= true
puts user_preferences[:theme] # => "light"
puts user_preferences[:notifications] # => true
# Subsequent assignments won't overwrite
user_preferences[:theme] ||= "dark"
puts user_preferences[:theme] # => "light"
The fetch
method provides more precise control over default value assignment. Unlike ||=
, it only assigns when the key is completely absent, not when the value is falsy.
settings = { auto_save: false }
# Using ||= incorrectly overwrites false
settings[:auto_save] ||= true
puts settings[:auto_save] # => true (incorrect!)
# Using fetch preserves false values
settings = { auto_save: false }
settings[:auto_save] = settings.fetch(:auto_save, true)
puts settings[:auto_save] # => false (correct!)
For explicit key existence checking, Ruby provides the key?
method and Hash bracket notation. This approach gives complete control over the storage logic.
cache = {}
def store_if_absent(hash, key, value)
hash[key] = value unless hash.key?(key)
hash[key]
end
result = store_if_absent(cache, :user_data, { name: "Alice", role: "admin" })
puts cache[:user_data] # => {:name=>"Alice", :role=>"admin"}
# Won't overwrite existing data
store_if_absent(cache, :user_data, { name: "Bob", role: "user" })
puts cache[:user_data] # => {:name=>"Alice", :role=>"admin"}
The concurrent-ruby gem provides thread-safe alternatives through Concurrent::Map
. The put_if_absent
method atomically inserts values only when keys are missing.
require 'concurrent-ruby'
shared_cache = Concurrent::Map.new
# Thread-safe conditional storage
result = shared_cache.put_if_absent(:session_id, SecureRandom.uuid)
puts result # => nil (key was absent, value stored)
# Subsequent calls return existing value
existing = shared_cache.put_if_absent(:session_id, SecureRandom.uuid)
puts existing # => original UUID (key exists, new value ignored)
Thread Safety & Concurrency
Thread safety becomes critical when multiple threads attempt to store values simultaneously. Standard Hash operations are not atomic, leading to race conditions where threads can interfere with each other's operations.
# Unsafe concurrent access
standard_hash = {}
threads = []
10.times do |i|
threads << Thread.new do
# Race condition: multiple threads may see key as absent
unless standard_hash.key?(:counter)
standard_hash[:counter] = 0
end
standard_hash[:counter] += 1
end
end
threads.each(&:join)
puts standard_hash[:counter] # => Unpredictable result, likely less than 10
The concurrent-ruby gem solves this with atomic operations. Concurrent::Map#put_if_absent
ensures that only one thread can successfully store a value for a new key.
require 'concurrent-ruby'
concurrent_map = Concurrent::Map.new
results = []
mutex = Mutex.new
10.times do |i|
threads << Thread.new do
result = concurrent_map.put_if_absent(:counter, 0)
mutex.synchronize { results << result }
# Safe increment using atomic operations
concurrent_map.compute(:counter) { |old_value| old_value + 1 }
end
end
threads.each(&:join)
puts concurrent_map[:counter] # => 10 (predictable result)
puts results.count(nil) # => 1 (only one thread stored initial value)
Concurrent::Map#compute_if_absent
provides more flexibility by accepting a block for value computation. This is useful when the stored value requires expensive calculation that should only happen if storage is necessary.
expensive_cache = Concurrent::Map.new
def expensive_computation(key)
sleep(1) # Simulate expensive operation
"computed_value_for_#{key}"
end
# Multiple threads requesting same key
threads = 5.times.map do |i|
Thread.new do
result = expensive_cache.compute_if_absent(:shared_key) do
expensive_computation(:shared_key)
end
puts "Thread #{i}: #{result}"
end
end
threads.each(&:join)
# Only one thread performs expensive computation
# All threads receive same result
For scenarios requiring custom synchronization, Mutex objects provide explicit thread safety around standard Hash operations.
class ThreadSafeCache
def initialize
@cache = {}
@mutex = Mutex.new
end
def store_if_absent(key, value)
@mutex.synchronize do
return @cache[key] if @cache.key?(key)
@cache[key] = value
end
end
def get(key)
@mutex.synchronize { @cache[key] }
end
end
cache = ThreadSafeCache.new
threads = 10.times.map do |i|
Thread.new do
cache.store_if_absent(:shared_data, "initial_value")
end
end
threads.each(&:join)
puts cache.get(:shared_data) # => "initial_value"
Advanced Usage
Advanced conditional storage patterns combine multiple techniques for complex scenarios. Nested conditional storage handles hierarchical data structures where intermediate keys may need creation.
def deep_store_if_absent(hash, keys, value)
keys[0..-2].inject(hash) do |h, key|
h[key] ||= {}
end.tap do |target|
final_key = keys.last
target[final_key] = value unless target.key?(final_key)
end
end
config = {}
deep_store_if_absent(config, [:database, :primary, :host], "localhost")
deep_store_if_absent(config, [:database, :primary, :port], 5432)
deep_store_if_absent(config, [:database, :replica, :host], "replica.db")
puts config
# => {:database=>{:primary=>{:host=>"localhost", :port=>5432},
# :replica=>{:host=>"replica.db"}}}
Concurrent nested storage requires careful synchronization to prevent partial updates during concurrent access.
class ConcurrentNestedMap
def initialize
@maps = Concurrent::Map.new
end
def store_if_absent(namespace, key, value)
namespace_map = @maps.compute_if_absent(namespace) do
Concurrent::Map.new
end
namespace_map.put_if_absent(key, value)
end
def get(namespace, key)
namespace_map = @maps[namespace]
namespace_map&.get(key)
end
end
nested_cache = ConcurrentNestedMap.new
# Concurrent access to different namespaces
threads = []
[:users, :sessions, :preferences].each do |namespace|
10.times do |i|
threads << Thread.new do
nested_cache.store_if_absent(namespace, "key_#{i}", "value_#{i}")
end
end
end
threads.each(&:join)
Time-based conditional storage implements expiring cache entries where values should only be stored if they haven't expired.
class ExpiringCache
def initialize(default_ttl: 3600)
@cache = Concurrent::Map.new
@default_ttl = default_ttl
end
def store_if_absent(key, value, ttl: nil)
ttl ||= @default_ttl
expires_at = Time.now + ttl
@cache.compute_if_absent(key) do
{ value: value, expires_at: expires_at }
end.tap do |entry|
if entry[:expires_at] < Time.now
@cache.compute(key) do
{ value: value, expires_at: expires_at }
end
end
end
entry[:value]
end
def get(key)
entry = @cache[key]
return nil unless entry
return nil if entry[:expires_at] < Time.now
entry[:value]
end
def cleanup_expired
@cache.each_pair do |key, entry|
@cache.delete(key) if entry[:expires_at] < Time.now
end
end
end
Conditional storage with validation ensures that only valid values are stored, preventing corruption of data structures.
class ValidatingStore
def initialize
@store = Concurrent::Map.new
@validators = {}
end
def add_validator(key_pattern, &validator)
@validators[key_pattern] = validator
end
def store_if_absent(key, value)
validator = @validators.find { |pattern, _| key.match?(pattern) }&.last
if validator && !validator.call(value)
raise ArgumentError, "Value #{value} failed validation for key #{key}"
end
@store.put_if_absent(key, value)
end
end
store = ValidatingStore.new
store.add_validator(/email/) { |value| value.include?("@") }
store.add_validator(/age/) { |value| value.is_a?(Integer) && value >= 0 }
store.store_if_absent(:user_email, "user@example.com") # Valid
store.store_if_absent(:user_age, 25) # Valid
begin
store.store_if_absent(:admin_email, "invalid-email") # Raises error
rescue ArgumentError => e
puts e.message
end
Performance & Memory
Performance characteristics vary significantly between different conditional storage approaches. Standard Hash operations with ||=
provide the fastest access for single-threaded scenarios but offer no thread safety guarantees.
require 'benchmark'
# Performance comparison
hash = {}
concurrent_map = Concurrent::Map.new
Benchmark.bm(20) do |x|
x.report("Hash ||=") do
100_000.times { |i| hash[i] ||= "value_#{i}" }
end
x.report("Concurrent::Map") do
100_000.times { |i| concurrent_map.put_if_absent(i, "value_#{i}") }
end
end
# Results show Hash ||= is ~3x faster for single-threaded access
# Concurrent::Map provides safety at performance cost
Memory usage patterns differ based on internal implementation. Concurrent::Map
uses more memory per entry due to thread safety structures, but provides better performance under high concurrency.
def measure_memory
GC.start
memory_before = GC.stat[:total_allocated_objects]
yield
GC.start
memory_after = GC.stat[:total_allocated_objects]
memory_after - memory_before
end
hash_memory = measure_memory do
hash = {}
10_000.times { |i| hash[i] ||= "value_#{i}" }
end
concurrent_map_memory = measure_memory do
map = Concurrent::Map.new
10_000.times { |i| map.put_if_absent(i, "value_#{i}") }
end
puts "Hash memory: #{hash_memory} objects"
puts "Concurrent::Map memory: #{concurrent_map_memory} objects"
# Concurrent::Map typically uses 20-30% more memory
Bulk operations can be optimized by minimizing synchronization overhead. Instead of individual put_if_absent
calls, batch operations reduce lock contention.
class BatchingCache
def initialize
@cache = Concurrent::Map.new
@batch_mutex = Mutex.new
end
def batch_store_if_absent(pairs)
results = {}
@batch_mutex.synchronize do
pairs.each do |key, value|
existing = @cache.put_if_absent(key, value)
results[key] = existing || value
end
end
results
end
end
cache = BatchingCache.new
data_to_store = (1..1000).map { |i| [i, "value_#{i}"] }
# Batch operation is more efficient than individual calls
results = cache.batch_store_if_absent(data_to_store)
Large-scale concurrent access patterns reveal performance differences between approaches. Concurrent::Map
scales better under heavy concurrent load despite higher per-operation overhead.
def concurrent_benchmark(storage, operation_count, thread_count)
threads = thread_count.times.map do |t|
Thread.new do
start_time = Time.now
(operation_count / thread_count).times do |i|
key = (t * 1000) + i
case storage
when Hash
storage[key] ||= "value_#{key}"
when Concurrent::Map
storage.put_if_absent(key, "value_#{key}")
end
end
Time.now - start_time
end
end
times = threads.map(&:value)
times.sum / times.length
end
hash_with_mutex = {}
mutex = Mutex.new
concurrent_map = Concurrent::Map.new
# Test with increasing thread counts
[1, 2, 4, 8].each do |thread_count|
hash_time = concurrent_benchmark(hash_with_mutex, 10_000, thread_count)
map_time = concurrent_benchmark(concurrent_map, 10_000, thread_count)
puts "#{thread_count} threads - Hash: #{hash_time}s, Map: #{map_time}s"
end
Common Pitfalls
Race conditions represent the most common pitfall when implementing conditional storage in concurrent environments. The check-then-act pattern creates windows where multiple threads can simultaneously observe absent keys.
# INCORRECT: Race condition
cache = {}
mutex = Mutex.new
threads = 10.times.map do |i|
Thread.new do
# Thread A and B both see key as absent
unless cache.key?(:shared_data)
sleep(0.01) # Simulate processing time
# Both threads may reach this line
cache[:shared_data] = "thread_#{i}_data"
end
end
end
threads.each(&:join)
puts cache[:shared_data] # => Unpredictable result
# CORRECT: Atomic operation
concurrent_map = Concurrent::Map.new
threads = 10.times.map do |i|
Thread.new do
# Only one thread succeeds in storing
result = concurrent_map.put_if_absent(:shared_data, "thread_#{i}_data")
puts "Thread #{i}: #{result.nil? ? 'stored' : 'found existing'}"
end
end
threads.each(&:join)
The ||=
operator can produce unexpected behavior when dealing with falsy values. It assigns new values not just when keys are absent, but also when existing values are false
or nil
.
user_settings = {
notifications_enabled: false,
dark_mode: nil,
auto_save: true
}
# INCORRECT: Overwrites false and nil values
user_settings[:notifications_enabled] ||= true
user_settings[:dark_mode] ||= false
puts user_settings[:notifications_enabled] # => true (should be false!)
puts user_settings[:dark_mode] # => false (should be nil!)
# CORRECT: Check key existence explicitly
def safe_default(hash, key, default)
hash.key?(key) ? hash[key] : hash[key] = default
end
user_settings = {
notifications_enabled: false,
dark_mode: nil
}
safe_default(user_settings, :notifications_enabled, true)
safe_default(user_settings, :dark_mode, false)
puts user_settings[:notifications_enabled] # => false (preserved!)
puts user_settings[:dark_mode] # => nil (preserved!)
Deadlock situations can occur when using multiple synchronized data structures or when callback blocks attempt to access the same concurrent structure.
# INCORRECT: Potential deadlock
map = Concurrent::Map.new
# Callback tries to access the same map
map.compute_if_absent(:key1) do
# This can cause deadlock in some implementations
map.put_if_absent(:key2, "value2")
"value1"
end
# CORRECT: Avoid nested access to same structure
map = Concurrent::Map.new
# Compute values independently
value1 = compute_value_for_key1()
value2 = compute_value_for_key2()
# Store without nested calls
map.put_if_absent(:key1, value1)
map.put_if_absent(:key2, value2)
Memory leaks can result from improper cleanup in long-running applications that continuously store values. Without periodic cleanup, caches grow indefinitely.
# PROBLEMATIC: Unbounded growth
class LeakyCache
def initialize
@cache = Concurrent::Map.new
end
def get_or_compute(key, &computation)
@cache.compute_if_absent(key, &computation)
end
end
# Each request creates new cache entries
cache = LeakyCache.new
loop do
user_id = SecureRandom.uuid # Always unique
cache.get_or_compute(user_id) { expensive_user_data(user_id) }
end
# BETTER: Implement size limits and cleanup
class BoundedCache
def initialize(max_size: 1000)
@cache = Concurrent::Map.new
@max_size = max_size
@access_times = Concurrent::Map.new
end
def get_or_compute(key, &computation)
cleanup_if_needed
@access_times[key] = Time.now
@cache.compute_if_absent(key, &computation)
end
private
def cleanup_if_needed
return unless @cache.size > @max_size
# Remove oldest accessed entries
sorted_keys = @access_times.to_h.sort_by { |_, time| time }.map(&:first)
keys_to_remove = sorted_keys.first(@cache.size - @max_size + 100)
keys_to_remove.each do |key|
@cache.delete(key)
@access_times.delete(key)
end
end
end
Exception safety requires careful consideration when computation blocks can raise exceptions. Failed computations should not leave data structures in inconsistent states.
# PROBLEMATIC: Exception leaves inconsistent state
cache = {}
def risky_store_if_absent(cache, key, &computation)
return cache[key] if cache.key?(key)
cache[key] = :computing # Marker value
value = computation.call
cache[key] = value
rescue => e
cache.delete(key) # Clean up marker
raise e
end
# BETTER: Use concurrent structures with proper exception handling
class SafeComputeCache
def initialize
@cache = Concurrent::Map.new
@computing = Concurrent::Set.new
end
def get_or_compute(key, &computation)
return @cache[key] if @cache.key?(key)
# Prevent multiple threads from computing same key
return wait_for_computation(key) unless @computing.add?(key)
begin
value = @cache.compute_if_absent(key) do
computation.call
end
value
ensure
@computing.delete(key)
end
end
private
def wait_for_computation(key)
# Wait for other thread to complete computation
sleep(0.01) while @computing.include?(key)
@cache[key]
end
end
Reference
Standard Hash Methods
Method | Parameters | Returns | Description |
---|---|---|---|
#[]=(key, value) |
key (Object), value (Object) |
Object |
Unconditionally stores value for key |
#fetch(key, default=nil) |
key (Object), default (Object) |
Object |
Returns value for key or default if absent |
#key?(key) |
key (Object) |
Boolean |
Tests if key exists in hash |
#has_key?(key) |
key (Object) |
Boolean |
Alias for key? |
#store(key, value) |
key (Object), value (Object) |
Object |
Alias for []= |
Concurrent::Map Methods
Method | Parameters | Returns | Description |
---|---|---|---|
#put_if_absent(key, value) |
key (Object), value (Object) |
Object or nil |
Atomically stores value if key absent |
#compute_if_absent(key, &block) |
key (Object), block |
Object |
Computes and stores if key absent |
#compute_if_present(key, &block) |
key (Object), block |
Object or nil |
Computes new value if key present |
#compute(key, &block) |
key (Object), block |
Object |
Computes new value unconditionally |
#get_and_set(key, value) |
key (Object), value (Object) |
Object or nil |
Atomically gets old value and sets new |
#replace_if_exists(key, value) |
key (Object), value (Object) |
Object or nil |
Replaces value only if key exists |
Common Patterns
Pattern | Usage | Thread Safe | Performance |
---|---|---|---|
hash[key] ||= value |
Simple default assignment | No | High |
hash.fetch(key, default) |
Safe default retrieval | No | High |
hash[key] = value unless hash.key?(key) |
Explicit conditional store | No | Medium |
map.put_if_absent(key, value) |
Thread-safe conditional store | Yes | Medium |
map.compute_if_absent(key) { value } |
Thread-safe lazy computation | Yes | Medium |
mutex.synchronize { hash[key] ||= value } |
Manual synchronization | Yes | Low |
Error Conditions
Condition | Standard Hash | Concurrent::Map | Mitigation |
---|---|---|---|
Key collision under concurrency | Race condition | Atomic operation | Use concurrent structures |
Falsy value overwrite with ||= |
Overwrites false/nil | N/A | Use explicit key checks |
Exception during computation | Inconsistent state | Atomic rollback | Implement proper cleanup |
Memory exhaustion | Unlimited growth | Unlimited growth | Implement size limits |
Deadlock in callbacks | N/A | Possible with self-access | Avoid nested structure access |
Performance Characteristics
Operation | Time Complexity | Space Complexity | Concurrency |
---|---|---|---|
Hash ||= |
O(1) | O(1) | None |
Hash with Mutex | O(1) + sync overhead | O(1) | Full serialization |
Concurrent::Map put_if_absent |
O(1) amortized | O(1) + safety overhead | Lock-free/minimized |
Concurrent::Map compute_if_absent |
O(1) + block execution | O(1) + safety overhead | Lock-free/minimized |
Batch operations | O(n) | O(n) | Reduced contention |