CrackedRuby logo

CrackedRuby

Set and SortedSet

Complete guide to Ruby's Set and SortedSet classes for managing collections of unique elements.

Standard Library Data Structures
4.1.1

Overview

Ruby provides two classes for managing collections of unique elements: Set and SortedSet. Both classes ensure that each element appears exactly once in the collection, but they differ in their internal organization and performance characteristics.

Set stores elements in an unordered fashion using a hash-based implementation, providing O(1) average-case performance for insertion, deletion, and membership testing. SortedSet maintains elements in sorted order using a red-black tree structure, resulting in O(log n) performance for these operations but guaranteeing ordered iteration.

require 'set'

# Basic Set usage
numbers = Set.new([1, 2, 3, 2, 1])
numbers.to_a  # => [1, 2, 3]

# Basic SortedSet usage  
sorted_numbers = SortedSet.new([3, 1, 2, 1])
sorted_numbers.to_a  # => [1, 2, 3]

Both classes inherit from Enumerable, providing access to methods like map, select, and reject. They implement set theory operations including union, intersection, and difference through operators like +, &, and -.

The primary difference lies in ordering guarantees: Set makes no promises about element order during iteration, while SortedSet always iterates in sorted order according to the natural comparison of elements or a custom comparison block.

# Set iteration order is undefined
Set.new([3, 1, 2]).each { |n| print n }  # Could output: 132, 213, etc.

# SortedSet iteration order is always sorted
SortedSet.new([3, 1, 2]).each { |n| print n }  # Always outputs: 123

Both classes require elements to implement appropriate comparison methods. For Set, elements must implement hash and eql?. For SortedSet, elements must implement the spaceship operator <=> for ordering comparisons.

Basic Usage

Creating sets requires loading the set library, as these classes are not part of Ruby's core classes. The most common initialization patterns involve passing an enumerable object to the constructor or using class methods.

require 'set'

# Creating empty sets
empty_set = Set.new
empty_sorted = SortedSet.new

# Creating from arrays
names = Set.new(['Alice', 'Bob', 'Charlie', 'Alice'])
names.size  # => 3

# Creating from other enumerables
range_set = Set.new(1..5)
range_set.to_a  # => [1, 2, 3, 4, 5]

# Using Set[] class method
colors = Set['red', 'green', 'blue']

Adding and removing elements uses familiar methods that mirror array operations. The key difference is that duplicate additions have no effect, and the methods return the set itself for chaining.

fruits = Set.new
fruits.add('apple')
fruits << 'banana'  # Alias for add
fruits.add('apple')  # No effect, already present
fruits.size  # => 2

# Chaining operations
fruits.add('cherry').add('date').delete('banana')
fruits.to_a  # => ['apple', 'cherry', 'date']

# Bulk operations
fruits.merge(['elderberry', 'fig', 'apple'])
fruits.subtract(['date', 'grape'])  # No error if 'grape' absent

Set theory operations form the core of set functionality. Ruby implements these through intuitive operators and method names that correspond to mathematical notation.

set1 = Set[1, 2, 3, 4]
set2 = Set[3, 4, 5, 6]

# Union (combines all elements)
union = set1 | set2      # => Set[1, 2, 3, 4, 5, 6]
union = set1.union(set2) # Same result

# Intersection (common elements only)
intersection = set1 & set2              # => Set[3, 4]  
intersection = set1.intersection(set2)  # Same result

# Difference (elements in first but not second)
difference = set1 - set2           # => Set[1, 2]
difference = set1.difference(set2) # Same result

# Symmetric difference (elements in either but not both)
symmetric = set1 ^ set2  # => Set[1, 2, 5, 6]

Membership testing and subset operations provide boolean results for set relationships. These operations are fundamental for set-based algorithms and filtering operations.

languages = Set['ruby', 'python', 'javascript']

# Membership testing
languages.include?('ruby')     # => true
languages.member?('python')    # => true (alias)
languages.include?('golang')   # => false

# Subset relationships
small_set = Set['ruby', 'python']
languages.superset?(small_set)  # => true
small_set.subset?(languages)    # => true

# Proper subset (strict subset)
languages.proper_superset?(small_set)  # => true
small_set.proper_subset?(languages)    # => true

# Disjoint sets (no common elements)
web_langs = Set['javascript', 'typescript']
system_langs = Set['c', 'rust']
web_langs.disjoint?(system_langs)  # => true

Performance & Memory

The performance characteristics of Set and SortedSet differ significantly due to their underlying implementations. Understanding these differences helps choose the appropriate class for specific use cases and performance requirements.

Set uses a hash table implementation that provides O(1) average-case performance for insertion, deletion, and lookup operations. This makes it ideal for scenarios requiring fast membership testing or frequent modifications to the collection.

require 'benchmark'
require 'set'

# Benchmark Set operations
large_array = (1..100_000).to_a
set = Set.new

Benchmark.bm(15) do |x|
  x.report('Set creation:') do
    Set.new(large_array)
  end
  
  x.report('Set insertion:') do
    large_array.each { |n| set.add(n) }
  end
  
  x.report('Set lookup:') do
    10_000.times { set.include?(rand(100_000)) }
  end
end

SortedSet maintains elements in sorted order using a red-black tree, resulting in O(log n) performance for basic operations. While slower than Set for individual operations, it provides guaranteed ordering and efficient range operations.

# Memory usage comparison
array = (1..10_000).to_a.shuffle
set = Set.new(array)
sorted_set = SortedSet.new(array)

# Memory footprint (approximate)
# Array: ~40KB (plus object overhead)  
# Set: ~80KB (hash table overhead)
# SortedSet: ~120KB (tree node overhead)

The performance gap becomes more pronounced with larger datasets. For collections with frequent insertions and deletions, Set significantly outperforms SortedSet. However, if ordered iteration is required, the cost of sorting a Set often exceeds the overhead of maintaining a SortedSet.

# Scenario: Building a leaderboard
scores = []
1000.times { scores << rand(1000) }

# Using Set (requires sorting for display)
score_set = Set.new
scores.each { |score| score_set.add(score) }
sorted_scores = score_set.to_a.sort  # O(n log n) sort

# Using SortedSet (always sorted)
score_sorted_set = SortedSet.new
scores.each { |score| score_sorted_set.add(score) }
sorted_scores = score_sorted_set.to_a  # Already sorted

Memory usage patterns also differ between the implementations. Set uses less memory per element but may have hash table overhead and potential bucket collisions. SortedSet uses more memory per element due to tree node pointers but provides predictable memory access patterns.

For applications processing large datasets, consider the access patterns when choosing between implementations. If the application primarily needs membership testing with occasional ordered output, maintain a Set and sort when needed. If the application requires frequent ordered iteration or range queries, SortedSet provides better overall performance.

# Memory-conscious approach for large datasets
class EfficientUniqueProcessor
  def initialize(need_ordering: false)
    @container = need_ordering ? SortedSet.new : Set.new
    @stats = { insertions: 0, lookups: 0 }
  end
  
  def add_item(item)
    @stats[:insertions] += 1
    @container.add(item)
  end
  
  def include?(item)
    @stats[:lookups] += 1
    @container.include?(item)
  end
  
  def ordered_items
    @container.respond_to?(:to_a) ? @container.to_a : @container.to_a.sort
  end
end

Common Pitfalls

Set implementations in Ruby contain several subtle behaviors that can lead to unexpected results. Understanding these pitfalls prevents common bugs and helps write more robust code using sets.

Equality semantics represent the most frequent source of confusion. Sets determine element uniqueness using eql? and hash methods, not the == operator. This can lead to surprising behavior when working with custom objects or numeric types.

# Numeric type pitfall
mixed_numbers = Set.new
mixed_numbers.add(1)      # Integer
mixed_numbers.add(1.0)    # Float  
mixed_numbers.add(1/1r)   # Rational

# All three are present because they have different hash values
mixed_numbers.size  # => 3
mixed_numbers.to_a  # => [1, 1.0, 1/1]

# But they compare as equal
1 == 1.0    # => true
1 == 1/1r   # => true

Custom objects must implement both hash and eql? consistently. Objects that are equal according to eql? must return the same hash value, but objects with the same hash value need not be equal.

class Person
  attr_reader :name, :age
  
  def initialize(name, age)
    @name = name
    @age = age  
  end
  
  # Incorrect implementation - only implements ==
  def ==(other)
    other.is_a?(Person) && name == other.name && age == other.age
  end
end

people = Set.new
people.add(Person.new('Alice', 30))
people.add(Person.new('Alice', 30))  # Adds duplicate!
people.size  # => 2 (unexpected)

# Correct implementation
class PersonFixed
  attr_reader :name, :age
  
  def initialize(name, age)
    @name = name
    @age = age
  end
  
  def eql?(other)
    other.is_a?(PersonFixed) && name == other.name && age == other.age
  end
  
  def hash
    [name, age].hash
  end
end

Mutability of set elements creates another common trap. Modifying objects after adding them to a set can break the set's internal consistency, leading to elements that cannot be found or removed.

# Dangerous pattern - modifying elements after insertion
users = Set.new
user = { name: 'Bob', role: 'admin' }
users.add(user)

# Modifying the object breaks set consistency
user[:role] = 'user'  # Changes hash value
users.include?(user)  # => false (can't find it!)
users.delete(user)    # => nil (can't remove it!)

# The set is now corrupted
users.size           # => 1 (element still there)
users.to_a.first     # => { name: 'Bob', role: 'user' }

SortedSet introduces additional complexity around comparison consistency. Elements must implement <=> and maintain consistent ordering throughout their lifetime. Changing an object's sort key after insertion corrupts the tree structure.

class Score
  attr_accessor :points
  
  def initialize(points)
    @points = points
  end
  
  def <=>(other)
    points <=> other.points
  end
  
  def eql?(other)
    other.is_a?(Score) && points == other.points
  end
  
  def hash
    points.hash
  end
end

scores = SortedSet.new
score1 = Score.new(100)
score2 = Score.new(200)

scores.add(score1)
scores.add(score2)
scores.to_a.map(&:points)  # => [100, 200]

# Dangerous - modifying sort key corrupts the tree
score1.points = 300
scores.to_a.map(&:points)  # => [300, 200] (wrong order!)

# Tree structure is now inconsistent
scores.include?(Score.new(300))  # => false (can't find equivalent)

Set operations can produce unexpected results when working with sets of different types or when assuming commutativity. While mathematical set operations are commutative, Ruby's implementation may preserve the type of the receiver.

set1 = Set[1, 2, 3]
set2 = SortedSet[2, 3, 4]

result1 = set1 | set2    # Returns Set
result2 = set2 | set1    # Returns SortedSet

# Both contain the same elements but have different types
result1.class  # => Set
result2.class  # => SortedSet

# This affects subsequent operations and iteration order
result1.to_a   # => [1, 2, 3, 4] (undefined order)
result2.to_a   # => [1, 2, 3, 4] (sorted order)

Reference

Set Class Methods

Method Parameters Returns Description
Set.new(enum = nil) enum (Enumerable) Set Creates new set from enumerable or empty set
Set[](*elements) Variable arguments Set Creates set from argument list

Set Instance Methods

Method Parameters Returns Description
#add(element) element (Object) Set Adds element to set, returns self
#<<(element) element (Object) Set Alias for add
#delete(element) element (Object) Set Removes element from set, returns self
#delete?(element) element (Object) Object or nil Removes and returns element, nil if absent
#include?(element) element (Object) Boolean Tests membership
#member?(element) element (Object) Boolean Alias for include?
#size None Integer Returns number of elements
#length None Integer Alias for size
#empty? None Boolean Returns true if set contains no elements
#clear None Set Removes all elements, returns self

Set Theory Operations

Method Parameters Returns Description
#union(set) set (Set) Set Returns union of sets
#|(set) set (Set) Set Operator alias for union
#+(set) set (Set) Set Alias for union
#intersection(set) set (Set) Set Returns intersection of sets
#&(set) set (Set) Set Operator alias for intersection
#difference(set) set (Set) Set Returns difference (elements in receiver not in argument)
#-(set) set (Set) Set Operator alias for difference
#^(set) set (Set) Set Returns symmetric difference

Set Relationship Methods

Method Parameters Returns Description
#subset?(set) set (Set) Boolean True if receiver is subset of argument
#superset?(set) set (Set) Boolean True if receiver is superset of argument
#proper_subset?(set) set (Set) Boolean True if receiver is proper subset of argument
#proper_superset?(set) set (Set) Boolean True if receiver is proper superset of argument
#disjoint?(set) set (Set) Boolean True if sets share no common elements

Set Modification Methods

Method Parameters Returns Description
#merge(enum) enum (Enumerable) Set Adds all elements from enumerable, returns self
#subtract(enum) enum (Enumerable) Set Removes all elements in enumerable, returns self
#replace(enum) enum (Enumerable) Set Replaces contents with elements from enumerable
#keep_if Block Set Removes elements where block returns falsy
#select! Block Set or nil Removes elements where block returns falsy, returns nil if unchanged
#delete_if Block Set Removes elements where block returns truthy
#reject! Block Set or nil Removes elements where block returns truthy, returns nil if unchanged

SortedSet Class Methods

Method Parameters Returns Description
SortedSet.new(enum = nil, &block) enum (Enumerable), block (Proc) SortedSet Creates sorted set with optional comparison block
SortedSet[](*elements) Variable arguments SortedSet Creates sorted set from argument list

SortedSet-Specific Methods

Method Parameters Returns Description
#min None Object Returns minimum element
#max None Object Returns maximum element

Common Enumerable Methods

Both Set and SortedSet include Enumerable, providing access to these methods:

Method Parameters Returns Description
#each Block Set Iterates over elements
#map Block Array Returns array of transformed elements
#select Block Array Returns array of elements where block returns truthy
#reject Block Array Returns array of elements where block returns falsy
#find Block Object or nil Returns first element where block returns truthy
#to_a None Array Converts set to array

Element Requirements

For Set:

  • Elements must implement hash method returning consistent integer
  • Elements must implement eql? method for equality testing
  • Objects that are eql? must return identical hash values

For SortedSet:

  • Elements must implement <=> method returning -1, 0, or 1
  • Elements must implement eql? and hash (inherits Set requirements)
  • Comparison must be consistent throughout object lifetime

Performance Characteristics

Operation Set SortedSet
Insertion O(1) average O(log n)
Deletion O(1) average O(log n)
Lookup O(1) average O(log n)
Iteration O(n) unordered O(n) sorted
Union O(n + m) O(n + m)
Intersection O(min(n, m)) O(n + m)