CrackedRuby logo

CrackedRuby

Data vs Struct Comparison

A comprehensive comparison of Ruby's Data and Struct classes for creating value objects and data containers.

Core Built-in Classes Data Class
2.11.2

Overview

Ruby provides two primary classes for creating simple data structures: Struct and Data. Both classes generate value objects with named attributes, but they serve different purposes and exhibit distinct behaviors. Struct creates mutable objects with optional method definitions, while Data produces immutable value objects focused on data integrity and functional programming patterns.

Struct has been part of Ruby since early versions, designed as a convenient way to create classes with named attributes and accessor methods. The Struct.new method returns a new class with the specified attributes, supporting both positional and keyword arguments for initialization.

Person = Struct.new(:name, :age)
person = Person.new("Alice", 30)
person.name = "Bob"  # Mutable
# => "Bob"

Data was introduced in Ruby 3.2 as an immutable alternative. Data objects cannot be modified after creation, making them suitable for functional programming patterns and situations requiring data integrity guarantees.

Person = Data.define(:name, :age)
person = Person.new(name: "Alice", age: 30)
person.with(name: "Bob")  # Returns new instance
# => #<data Person name="Bob", age=30>

The fundamental difference lies in mutability. Struct instances can be modified after creation, while Data instances are frozen and immutable. This affects memory usage, thread safety, and programming patterns. Data objects also provide built-in pattern matching support and more restrictive initialization semantics.

Both classes automatically generate accessor methods, equality comparisons, and hash methods. However, they differ in their approach to customization, inheritance, and method definition. Struct allows defining methods within the class definition block, while Data focuses on pure data representation with minimal behavior.

Basic Usage

Struct creation supports multiple initialization patterns. The most common approach defines attributes as symbols, creating a new class with accessor methods for each attribute.

# Basic struct definition
Point = Struct.new(:x, :y)
point = Point.new(10, 20)
point.x  # => 10
point.y = 30  # Modifies existing instance

# Keyword arguments
Person = Struct.new(:name, :age, keyword_init: true)
person = Person.new(name: "Carol", age: 25)
person.age = 26  # Direct modification

Data requires keyword arguments for initialization and provides a different creation syntax. The Data.define method creates an immutable class with the specified attributes.

# Basic data definition
Point = Data.define(:x, :y)
point = Point.new(x: 10, y: 20)
point.x  # => 10
# point.y = 30  # Raises FrozenError

# Creating modified copies
new_point = point.with(y: 30)
# => #<data Point x=10, y=30>

Both structures support destructuring and pattern matching, but Data provides enhanced pattern matching capabilities. Struct can be destructured using array-like syntax, while Data supports both array and hash-like destructuring.

# Struct destructuring
Point = Struct.new(:x, :y)
point = Point.new(5, 15)
x, y = point.to_a
# => [5, 15]

# Data pattern matching
Point = Data.define(:x, :y)
point = Point.new(x: 5, y: 15)
case point
in Point(x: 0, y:)
  puts "On Y axis: #{y}"
in Point(x:, y: 0)
  puts "On X axis: #{x}"
in Point(x:, y:)
  puts "Point at #{x}, #{y}"
end

Default values work differently between the two classes. Struct supports default values through initialization parameters, while Data handles defaults through the with method and careful initialization patterns.

# Struct with defaults
Config = Struct.new(:host, :port, :timeout) do
  def initialize(host: "localhost", port: 8080, timeout: 30)
    super(host, port, timeout)
  end
end

# Data with defaults
Config = Data.define(:host, :port, :timeout) do
  def self.default
    new(host: "localhost", port: 8080, timeout: 30)
  end
end

Advanced Usage

Both Struct and Data support method definition, but with different philosophies. Struct encourages adding behavior directly to the generated class, while Data promotes composition and functional patterns.

# Struct with custom methods
class Rectangle < Struct.new(:width, :height)
  def area
    width * height
  end
  
  def resize!(factor)
    self.width *= factor
    self.height *= factor
    self
  end
  
  def perimeter
    2 * (width + height)
  end
end

rect = Rectangle.new(10, 5)
rect.resize!(2)  # Modifies in place
rect.area  # => 100

Data classes focus on immutable transformations and functional composition. Method definitions typically return new instances rather than modifying existing ones.

Rectangle = Data.define(:width, :height) do
  def area
    width * height
  end
  
  def resize(factor)
    with(width: width * factor, height: height * factor)
  end
  
  def perimeter
    2 * (width + height)
  end
  
  def scale_to_area(target_area)
    factor = Math.sqrt(target_area.to_f / area)
    resize(factor)
  end
end

rect = Rectangle.new(width: 10, height: 5)
bigger_rect = rect.resize(2)  # Returns new instance
scaled_rect = rect.scale_to_area(200)

Inheritance patterns differ significantly. Struct supports classical inheritance with shared mutable state, while Data inheritance maintains immutability constraints across the hierarchy.

# Struct inheritance
Animal = Struct.new(:name, :species)
class Dog < Animal
  def initialize(name, breed)
    super(name, "dog")
    @breed = breed
  end
  
  attr_reader :breed
  
  def bark
    "#{name} says woof!"
  end
end

# Data inheritance
Animal = Data.define(:name, :species)
Dog = Data.define(:name, :breed) do
  def initialize(name:, breed:)
    super(name: name, species: "dog", breed: breed)
  end
  
  def bark
    "#{name} says woof!"
  end
end

Complex initialization and validation logic requires different approaches. Struct can modify instance variables during initialization, while Data must validate during creation since instances become immutable.

# Struct with validation
class EmailContact < Struct.new(:email, :name)
  def initialize(email, name = nil)
    raise ArgumentError, "Invalid email" unless email.include?("@")
    super
    normalize_email!
  end
  
  private
  
  def normalize_email!
    self.email = email.downcase.strip
  end
end

# Data with validation
EmailContact = Data.define(:email, :name) do
  def initialize(email:, name: nil)
    raise ArgumentError, "Invalid email" unless email.include?("@")
    super(email: email.downcase.strip, name: name)
  end
  
  def update_email(new_email)
    self.class.new(email: new_email, name: name)
  end
end

Performance & Memory

Memory usage patterns differ substantially between Struct and Data due to their mutability characteristics. Struct instances consume less memory initially but may require additional allocations when modified. Data instances are frozen and optimized for sharing but create new objects for each modification.

require 'benchmark/memory'

# Memory comparison for creation
Benchmark.memory do |x|
  Point = Struct.new(:x, :y)
  
  x.report("Struct creation") do
    1000.times { Point.new(rand(100), rand(100)) }
  end
  
  DataPoint = Data.define(:x, :y)
  
  x.report("Data creation") do
    1000.times { DataPoint.new(x: rand(100), y: rand(100)) }
  end
  
  x.compare!
end

Performance characteristics vary based on usage patterns. Struct excels at in-place modifications and scenarios requiring frequent updates. Data performs better in functional programming contexts with many intermediate values and sharing scenarios.

require 'benchmark'

# Performance comparison for modifications
Benchmark.bm do |x|
  struct_point = Struct.new(:x, :y).new(0, 0)
  data_point = Data.define(:x, :y).new(x: 0, y: 0)
  
  x.report("Struct mutation") do
    point = struct_point.dup
    1000.times do |i|
      point.x = i
      point.y = i * 2
    end
  end
  
  x.report("Data transformation") do
    point = data_point
    1000.times do |i|
      point = point.with(x: i, y: i * 2)
    end
  end
end

Hash and equality operations show different performance profiles. Data objects benefit from cached hash values and optimized equality checks, while Struct instances recalculate these values based on current attribute states.

# Hash performance comparison
struct_points = Array.new(1000) { Struct.new(:x, :y).new(rand(100), rand(100)) }
data_points = Array.new(1000) { Data.define(:x, :y).new(x: rand(100), y: rand(100)) }

Benchmark.bm do |x|
  x.report("Struct hash operations") do
    hash = {}
    struct_points.each { |point| hash[point] = true }
  end
  
  x.report("Data hash operations") do
    hash = {}
    data_points.each { |point| hash[point] = true }
  end
end

Memory sharing scenarios favor Data objects. Since they're immutable, multiple references to the same Data instance don't risk unexpected mutations. Struct instances require defensive copying in shared contexts.

# Memory sharing example
shared_config = Data.define(:host, :port, :ssl).new(
  host: "api.example.com",
  port: 443,
  ssl: true
)

# Safe to share across threads and contexts
clients = Array.new(10) do |i|
  # Each client can safely reference shared config
  { id: i, config: shared_config }
end

# Struct requires defensive copying
StructConfig = Struct.new(:host, :port, :ssl)
base_config = StructConfig.new("api.example.com", 443, true)

clients = Array.new(10) do |i|
  # Must duplicate to prevent accidental mutations
  { id: i, config: base_config.dup }
end

Common Pitfalls

Mutability assumptions cause frequent errors when switching between Struct and Data. Code expecting mutable behavior fails with Data objects, while functional code may not account for Struct mutations.

# Dangerous assumption with Data
def update_coordinates(point, x, y)
  point.x = x  # FrozenError with Data objects
  point.y = y
  point
end

# Correct approach for both
def update_coordinates(point, x, y)
  if point.respond_to?(:with)
    point.with(x: x, y: y)  # Data
  else
    point.dup.tap { |p| p.x = x; p.y = y }  # Struct
  end
end

Initialization syntax differences create subtle bugs. Struct accepts both positional and keyword arguments depending on configuration, while Data always requires keywords.

# Struct flexibility can hide bugs
Person = Struct.new(:name, :age)
person1 = Person.new("Alice", 30)      # Positional
person2 = Person.new(age: 25)          # Partial keyword
person3 = Person.new("Bob", age: 40)   # Mixed - dangerous!

# Data consistency
Person = Data.define(:name, :age)
person1 = Person.new(name: "Alice", age: 30)  # Always keywords
# person2 = Person.new("Bob", 25)  # ArgumentError

Pattern matching behavior varies between the classes. Data provides first-class pattern matching support, while Struct requires array-style destructuring or additional setup.

# Pattern matching pitfall
def process_point(point)
  case point
  in { x: 0, y: }  # Works with Data, not with Struct
    "On Y axis"
  in [0, y]        # Works with Struct, not with Data  
    "On Y axis"
  end
end

# Robust pattern matching
def process_point(point)
  case point
  when ->(p) { p.x == 0 }
    "On Y axis: #{point.y}"
  when ->(p) { p.y == 0 }
    "On X axis: #{point.x}"
  else
    "Point at #{point.x}, #{point.y}"
  end
end

Thread safety misconceptions occur frequently. While Data objects are immutable and thread-safe, Struct instances require synchronization for shared access.

# Thread safety pitfall with Struct
counter = Struct.new(:value).new(0)

threads = 10.times.map do
  Thread.new do
    1000.times { counter.value += 1 }  # Race condition
  end
end
threads.each(&:join)
# counter.value is unpredictable

# Data approach requires different pattern
Counter = Data.define(:value)
counter = Counter.new(value: 0)
mutex = Mutex.new

threads = 10.times.map do
  Thread.new do
    1000.times do
      mutex.synchronize do
        counter = counter.with(value: counter.value + 1)
      end
    end
  end
end

Serialization and deserialization behavior differs subtly. Both classes support standard Ruby serialization, but Data objects maintain their frozen state across serialization boundaries.

# Serialization behavior
struct_point = Struct.new(:x, :y).new(10, 20)
data_point = Data.define(:x, :y).new(x: 10, y: 20)

# Both serialize similarly
struct_yaml = YAML.dump(struct_point)
data_yaml = YAML.dump(data_point)

# But deserialize with different mutability
restored_struct = YAML.load(struct_yaml)
restored_data = YAML.load(data_yaml)

restored_struct.x = 30  # Works
# restored_data.x = 30  # FrozenError

Production Patterns

Web application contexts often require different approaches for Struct and Data usage. Data objects work well for configuration, request/response objects, and functional pipelines, while Struct fits mutable model attributes and builder patterns.

# API response modeling with Data
APIResponse = Data.define(:status, :data, :errors) do
  def success?
    status == 200 && errors.empty?
  end
  
  def with_error(error)
    with(errors: errors + [error])
  end
  
  def transform_data(&block)
    return self unless success?
    with(data: block.call(data))
  end
end

# Usage in Rails controller
class UsersController < ApplicationController
  def show
    user = User.find(params[:id])
    response = APIResponse.new(
      status: 200,
      data: user.as_json,
      errors: []
    )
    
    enriched = response
      .transform_data { |data| data.merge(preferences: user.preferences) }
      .transform_data { |data| data.merge(avatar_url: avatar_service.url_for(user)) }
    
    render json: enriched.data
  rescue ActiveRecord::RecordNotFound => e
    error_response = APIResponse.new(status: 404, data: nil, errors: [e.message])
    render json: error_response, status: 404
  end
end

Database integration patterns highlight the differences in approach. Struct objects can represent mutable active record attributes, while Data objects work better for value objects and immutable domain models.

# Struct for mutable database representations
class UserProfile < Struct.new(:user_id, :bio, :website, :location, keyword_init: true)
  def self.from_database(row)
    new(
      user_id: row['user_id'],
      bio: row['bio'],
      website: row['website'],
      location: row['location']
    )
  end
  
  def update_from_params(params)
    self.bio = params[:bio] if params.key?(:bio)
    self.website = params[:website] if params.key?(:website)
    self.location = params[:location] if params.key?(:location)
  end
  
  def to_database_hash
    { user_id: user_id, bio: bio, website: website, location: location }
  end
end

# Data for immutable domain models
Address = Data.define(:street, :city, :state, :zip_code) do
  def self.from_string(address_string)
    parts = address_string.split(', ')
    new(
      street: parts[0],
      city: parts[1],
      state: parts[2]&.split(' ')&.first,
      zip_code: parts[2]&.split(' ')&.last
    )
  end
  
  def formatted
    "#{street}, #{city}, #{state} #{zip_code}"
  end
  
  def in_state?(target_state)
    state.downcase == target_state.downcase
  end
end

Background job processing shows clear distinctions in usage patterns. Data objects excel as immutable job parameters, while Struct objects work well for mutable job state tracking.

# Data for immutable job parameters
EmailJob = Data.define(:recipient, :subject, :template, :variables) do
  def perform
    EmailService.send_email(
      to: recipient,
      subject: subject,
      body: TemplateRenderer.render(template, variables)
    )
  end
  
  def retry_with_delay(delay_seconds)
    with(variables: variables.merge(retry_delay: delay_seconds))
  end
end

# Struct for mutable job tracking
class JobStatus < Struct.new(:job_id, :status, :progress, :started_at, :completed_at, keyword_init: true)
  def start!
    self.status = 'running'
    self.started_at = Time.current
    save_to_redis
  end
  
  def update_progress!(percent)
    self.progress = percent
    save_to_redis
  end
  
  def complete!
    self.status = 'completed'
    self.progress = 100
    self.completed_at = Time.current
    save_to_redis
  end
  
  private
  
  def save_to_redis
    Redis.current.setex("job:#{job_id}", 3600, to_json)
  end
end

Caching strategies require different approaches. Data objects make excellent cache keys due to their immutability and hash consistency, while Struct objects need careful handling to avoid cache invalidation issues.

# Data objects as cache keys
UserPreferences = Data.define(:theme, :language, :timezone, :notifications) do
  def cache_key
    "preferences:#{hash}"
  end
  
  def self.cached_for_user(user_id)
    cache_key = "user_preferences:#{user_id}"
    Rails.cache.fetch(cache_key, expires_in: 1.hour) do
      # Load from database and return Data object
      row = Database.query("SELECT * FROM user_preferences WHERE user_id = ?", user_id).first
      new(
        theme: row['theme'],
        language: row['language'], 
        timezone: row['timezone'],
        notifications: JSON.parse(row['notifications'])
      )
    end
  end
end

# Struct requires cache invalidation management
class MutableUserPreferences < Struct.new(:user_id, :theme, :language, :timezone, :notifications, keyword_init: true)
  def save!
    Database.query("UPDATE user_preferences SET ... WHERE user_id = ?", user_id)
    invalidate_cache
  end
  
  def update_theme(new_theme)
    self.theme = new_theme
    save!
  end
  
  private
  
  def invalidate_cache
    Rails.cache.delete("user_preferences:#{user_id}")
  end
end

Reference

Class Creation Methods

Method Parameters Returns Description
Struct.new(*attrs, keyword_init: false, &block) attrs (Array), keyword_init (Boolean) Class Creates new Struct class with specified attributes
Data.define(*attrs, &block) attrs (Array) Class Creates new Data class with specified attributes

Instance Creation

Method Parameters Returns Description
StructClass.new(*values) values (Array) StructInstance Creates struct instance with positional arguments
StructClass.new(**kwargs) kwargs (Hash) StructInstance Creates struct instance with keyword arguments (if enabled)
DataClass.new(**kwargs) kwargs (Hash) DataInstance Creates data instance with keyword arguments only

Instance Methods - Common

Method Parameters Returns Description
#to_a None Array Returns array of attribute values
#to_h None Hash Returns hash of attribute name/value pairs
#== other (Object) Boolean Compares objects by attribute values
#eql? other (Object) Boolean Strict equality comparison
#hash None Integer Returns hash value for object
#inspect None String Returns string representation

Instance Methods - Struct Only

Method Parameters Returns Description
#[](name_or_index) name_or_index (Symbol/Integer) Object Gets attribute value by name or index
#[]=(name_or_index, value) name_or_index (Symbol/Integer), value (Object) Object Sets attribute value by name or index
#each &block Enumerator/self Iterates over attribute values
#each_pair &block Enumerator/self Iterates over attribute name/value pairs
#length None Integer Returns number of attributes
#size None Integer Alias for length

Instance Methods - Data Only

Method Parameters Returns Description
#with(**kwargs) kwargs (Hash) DataInstance Returns new instance with updated attributes
#deconstruct None Array Returns array for pattern matching
#deconstruct_keys keys (Array) Hash Returns hash for pattern matching

Mutability Characteristics

Feature Struct Data
Attribute Modification Mutable via attr= methods Immutable, frozen after creation
In-place Updates Supported Not supported, raises FrozenError
Thread Safety Requires synchronization Thread-safe due to immutability
Memory Sharing Requires defensive copying Safe to share references
Functional Patterns Requires explicit copying Built-in via #with method

Initialization Patterns

Pattern Struct Data
Positional Args Point.new(x, y) Not supported
Keyword Args Point.new(x: 1, y: 2) (if enabled) Point.new(x: 1, y: 2) (required)
Mixed Args Supported but discouraged Not supported
Partial Init Fills missing with nil Requires all attributes
Default Values Via custom initialize Via factory methods or #with

Performance Characteristics

Operation Struct Data
Creation Faster Slightly slower (immutability setup)
Modification In-place, very fast Creates new instance, slower
Hash Operations Recalculates hash Cached hash value
Equality Checks Standard comparison Optimized for immutable data
Memory Usage Lower per instance Higher per instance, better sharing
GC Pressure Lower for mutations Higher for transformations

Pattern Matching Support

Feature Struct Data
Array Patterns in [x, y] in [x, y]
Hash Patterns Limited support in {x:, y:}
Deconstruction Via to_a Via #deconstruct and #deconstruct_keys
Variable Binding Manual extraction Automatic via pattern matching
Guard Clauses External conditions Integrated pattern support

Common Error Conditions

Error Struct Data
FrozenError Only if explicitly frozen Always raised on mutation attempts
ArgumentError Wrong number of arguments Missing required keywords
NoMethodError Invalid attribute names Invalid attribute names
TypeError Type mismatches in custom logic Type mismatches in custom logic