CrackedRuby - Data Partitioning

Overview

Data partitioning divides a large database table into smaller, more manageable pieces called partitions while maintaining the logical structure of a single table. Each partition stores a subset of the data based on defined criteria, allowing database systems to query only relevant partitions rather than scanning entire tables. This technique addresses performance degradation that occurs as tables grow to millions or billions of rows.

Partitioning operates transparently to applications in most implementations. Queries against a partitioned table function identically to queries against a non-partitioned table, with the database optimizer automatically determining which partitions to access. This abstraction allows adding partitioning to existing systems without modifying application code.

The technique originated in large-scale data warehouse systems where analytical queries scanned massive fact tables. Early implementations appeared in enterprise database systems like Oracle and DB2 in the 1990s. PostgreSQL added native table partitioning in version 10 (2017), while MySQL has supported partitioning since version 5.1 (2008). Modern distributed databases like Cassandra and MongoDB implement partitioning as a fundamental architecture component.

# Logical view - single table interface
Customer.where(region: 'US').count

# Physical storage - data distributed across multiple partitions
# customers_us (partition 1)
# customers_eu (partition 2)
# customers_asia (partition 3)

Partitioning differs from sharding in scope and purpose. Partitioning divides a table within a single database instance, maintaining ACID guarantees and foreign key relationships. Sharding distributes data across multiple database servers, requiring application-level coordination and sacrificing some transactional guarantees for horizontal scalability.

The primary benefits include improved query performance through partition pruning, where the database scans only relevant partitions; faster maintenance operations by working on individual partitions; and better resource management by archiving or compressing old partitions. Systems with time-series data, multi-tenant architectures, or geographical data distribution gain substantial advantages from partitioning.

Key Principles

Data partitioning operates on the principle of divide-and-conquer. The partition key determines how rows distribute across partitions. This key can be a single column or multiple columns, and the database uses it to route data to specific partitions during inserts and to identify relevant partitions during queries.

Partition Types

Range partitioning divides data based on value ranges. A table partitioned by date might have monthly partitions, with all January 2024 records in one partition and February 2024 records in another. Range partitioning suits time-series data, sequential identifiers, and any naturally ordered data where queries typically filter by ranges.

# Range partitioning by date
# Partition 1: created_at >= '2024-01-01' AND created_at < '2024-02-01'
# Partition 2: created_at >= '2024-02-01' AND created_at < '2024-03-01'
# Partition 3: created_at >= '2024-03-01' AND created_at < '2024-04-01'

List partitioning assigns specific discrete values to each partition. A multi-tenant application might partition by tenant_id, with specific tenants grouped into partitions. List partitioning works well for categorical data with known, finite values like country codes, product categories, or status codes.

Hash partitioning applies a hash function to the partition key, distributing data evenly across a fixed number of partitions. This method achieves balanced data distribution when no natural partitioning scheme exists. Hash partitioning prevents hotspots but limits query optimization since the database cannot skip partitions based on value ranges.

# Hash partitioning distributes data evenly
# hash(user_id) % 4 determines partition:
# Partition 0: user_ids where hash(user_id) % 4 == 0
# Partition 1: user_ids where hash(user_id) % 4 == 1
# Partition 2: user_ids where hash(user_id) % 4 == 2
# Partition 3: user_ids where hash(user_id) % 4 == 3

Composite partitioning combines multiple methods. A table might use range partitioning by date and sub-partition with hash on user_id, creating a hierarchy where each date range splits into multiple hash-based partitions. This approach addresses complex distribution requirements.

Partition Pruning

Partition pruning represents the core optimization that makes partitioning valuable. When executing a query with a WHERE clause that references the partition key, the database examines the partition definitions and eliminates irrelevant partitions from consideration. A query for records from March 2024 on a date-partitioned table accesses only the March partition, ignoring all others.

The effectiveness of partition pruning depends on query patterns. Queries filtering by the partition key achieve optimal performance. Queries without partition key filters perform full table scans across all partitions, negating partitioning benefits. Analyzing query workloads determines appropriate partition keys.

Partition Management

Partitions require ongoing maintenance. Creating new partitions for upcoming time periods, dropping old partitions to archive data, and rebalancing partitions as data distribution changes constitute regular operations. Automated partition management through scripts or database features reduces operational burden.

Each partition maintains its own indexes, statistics, and physical storage. Maintenance operations like VACUUM in PostgreSQL or OPTIMIZE TABLE in MySQL operate on individual partitions, completing faster than operating on the entire table. This granularity allows maintenance during business hours without locking the entire table.

Implementation Approaches

Horizontal vs Vertical Partitioning

Horizontal partitioning divides rows across partitions, with each partition containing a subset of rows but all columns. This standard partitioning approach addresses the most common scaling challenge: too many rows. Vertical partitioning divides columns across partitions, with each partition containing all rows but only specific columns. Vertical partitioning optimizes for tables with many columns where queries typically access only subsets of columns.

# Horizontal partitioning - subset of rows
# Partition 1: rows where year = 2023
# Partition 2: rows where year = 2024

# Vertical partitioning - subset of columns
# Partition 1: id, name, email
# Partition 2: id, large_text_field, blob_data

Partition Strategy Selection

Range partitioning excels for time-series data. Event logs, transaction records, sensor readings, and audit trails naturally partition by timestamp. Queries typically filter by time ranges, accessing recent data more frequently than historical data. Old partitions can be archived or compressed without affecting current operations.

List partitioning suits data with clear categorical boundaries. Multi-tenant SaaS applications partition by tenant, isolating tenant data and enabling per-tenant operations. Geographic applications partition by region or country. The key requirement is a relatively small number of discrete values with predictable distribution.

Hash partitioning distributes data evenly when no natural partitioning boundary exists. User tables often use hash partitioning on user_id to achieve balanced distribution. Hash partitioning prevents hotspots but limits partition pruning to equality comparisons. Range queries on hash-partitioned tables scan all partitions.

Partition Key Selection

The partition key determines partitioning effectiveness. Ideal partition keys appear in most query WHERE clauses, distribute data evenly across partitions, and remain immutable. Changing a row's partition key requires moving it between partitions, a costly operation.

Time-based keys work well for append-mostly data where updates occur on recent records. User or tenant IDs work for multi-tenant systems with tenant-isolated queries. Composite keys combining time and category handle complex access patterns.

# Good partition key - appears in most queries, evenly distributed
class Event < ApplicationRecord
  # Partitioned by event_date
  # Most queries: Event.where(event_date: date_range)
end

# Poor partition key - skewed distribution
class Order < ApplicationRecord
  # Partitioned by status (90% are 'completed')
  # Creates massive 'completed' partition
end

Partition Sizing

Partition size balances partition pruning benefits against partition management overhead. Too many small partitions increase metadata overhead and complicate maintenance. Too few large partitions reduce partition pruning effectiveness.

Common sizing strategies include monthly partitions for high-volume tables, weekly partitions for extremely high-volume tables, and yearly partitions for moderate-volume tables. The target size depends on database capabilities, query patterns, and maintenance requirements. PostgreSQL handles hundreds of partitions efficiently; thousands of partitions may impact performance.

Partition Creation Strategies

Pre-creating future partitions prevents insertion failures. Scripts or database jobs create partitions for upcoming time periods. A common pattern creates partitions three months in advance and drops partitions older than two years.

Dynamic partition creation automatically generates partitions as needed. Some databases support automatic partition creation when inserting data without a matching partition. This approach reduces operational overhead but requires error handling for partition creation failures.

Ruby Implementation

Ruby applications interact with partitioned tables through ActiveRecord with database-specific extensions or specialized gems. PostgreSQL offers the most mature partitioning support with native features and robust gem ecosystems.

PostgreSQL Partitioning with ActiveRecord

PostgreSQL 10+ supports declarative partitioning. Creating partitioned tables requires SQL since ActiveRecord migrations lack native partitioning syntax. The table definition specifies the partition method and key, with child partitions created separately.

# Migration creating range-partitioned table
class CreatePartitionedEvents < ActiveRecord::Migration[7.0]
  def up
    execute <<-SQL
      CREATE TABLE events (
        id BIGSERIAL,
        user_id BIGINT NOT NULL,
        event_type VARCHAR(50) NOT NULL,
        event_date DATE NOT NULL,
        payload JSONB,
        created_at TIMESTAMP NOT NULL
      ) PARTITION BY RANGE (event_date);
    SQL
    
    # Create partitions for each month
    execute <<-SQL
      CREATE TABLE events_2024_01 PARTITION OF events
        FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
      
      CREATE TABLE events_2024_02 PARTITION OF events
        FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
      
      CREATE TABLE events_2024_03 PARTITION OF events
        FOR VALUES FROM ('2024-03-01') TO ('2024-04-01');
    SQL
    
    # Indexes created on parent table apply to all partitions
    execute "CREATE INDEX idx_events_user_id ON events (user_id);"
    execute "CREATE INDEX idx_events_type ON events (event_type);"
  end
  
  def down
    execute "DROP TABLE IF EXISTS events CASCADE;"
  end
end

ActiveRecord models interact with partitioned tables transparently. The application references the parent table, and PostgreSQL routes queries to appropriate partitions.

class Event < ApplicationRecord
  belongs_to :user
  
  scope :for_date_range, ->(start_date, end_date) {
    where(event_date: start_date..end_date)
  }
  
  scope :recent, -> { where('event_date >= ?', 30.days.ago) }
end

# Queries automatically benefit from partition pruning
Event.where(event_date: Date.new(2024, 1, 15))
# Only scans events_2024_01 partition

Event.for_date_range(Date.new(2024, 1, 15), Date.new(2024, 2, 15))
# Only scans events_2024_01 and events_2024_02 partitions

Automated Partition Management

Managing partitions manually becomes tedious. Ruby scripts automate partition creation and deletion based on retention policies.

# Partition manager for time-based partitions
class PartitionManager
  def initialize(table_name, partition_key, interval: :monthly)
    @table_name = table_name
    @partition_key = partition_key
    @interval = interval
  end
  
  def create_future_partitions(months_ahead: 3)
    (0...months_ahead).each do |offset|
      date = Date.today.beginning_of_month + offset.months
      create_partition(date)
    end
  end
  
  def drop_old_partitions(retention_months: 24)
    cutoff_date = Date.today.beginning_of_month - retention_months.months
    
    partition_names = fetch_partition_names
    partition_names.each do |partition_name|
      partition_date = extract_date_from_partition_name(partition_name)
      next if partition_date >= cutoff_date
      
      drop_partition(partition_name)
    end
  end
  
  private
  
  def create_partition(date)
    partition_name = "#{@table_name}_#{date.strftime('%Y_%m')}"
    start_date = date.beginning_of_month
    end_date = date.end_of_month + 1.day
    
    sql = <<-SQL
      CREATE TABLE IF NOT EXISTS #{partition_name}
        PARTITION OF #{@table_name}
        FOR VALUES FROM ('#{start_date}') TO ('#{end_date}');
    SQL
    
    ActiveRecord::Base.connection.execute(sql)
    Rails.logger.info("Created partition: #{partition_name}")
  rescue ActiveRecord::StatementInvalid => e
    Rails.logger.error("Failed to create partition #{partition_name}: #{e.message}")
  end
  
  def drop_partition(partition_name)
    sql = "DROP TABLE IF EXISTS #{partition_name};"
    ActiveRecord::Base.connection.execute(sql)
    Rails.logger.info("Dropped partition: #{partition_name}")
  end
  
  def fetch_partition_names
    sql = <<-SQL
      SELECT tablename FROM pg_tables
      WHERE schemaname = 'public'
        AND tablename LIKE '#{@table_name}_%'
      ORDER BY tablename;
    SQL
    
    result = ActiveRecord::Base.connection.execute(sql)
    result.map { |row| row['tablename'] }
  end
  
  def extract_date_from_partition_name(name)
    # Extract date from pattern: table_name_2024_01
    match = name.match(/(\d{4})_(\d{2})$/)
    return nil unless match
    
    Date.new(match[1].to_i, match[2].to_i, 1)
  end
end

# Usage in a scheduled job
class PartitionMaintenanceJob < ApplicationJob
  queue_as :default
  
  def perform
    manager = PartitionManager.new('events', 'event_date')
    manager.create_future_partitions(months_ahead: 6)
    manager.drop_old_partitions(retention_months: 24)
  end
end

PgParty Gem

The pg_party gem provides ActiveRecord DSL for partition management. It abstracts partition creation syntax and adds migration helpers.

# Gemfile
gem 'pg_party'

# Migration using pg_party
class CreatePartitionedOrders < ActiveRecord::Migration[7.0]
  def up
    create_range_partition :orders, partition_key: :order_date do |t|
      t.bigint :customer_id, null: false
      t.decimal :total_amount, precision: 10, scale: 2
      t.string :status
      t.date :order_date, null: false
      t.timestamps
    end
    
    create_range_partition_of :orders,
      name: :orders_2024_q1,
      start_range: '2024-01-01',
      end_range: '2024-04-01'
    
    create_range_partition_of :orders,
      name: :orders_2024_q2,
      start_range: '2024-04-01',
      end_range: '2024-07-01'
  end
  
  def down
    drop_table :orders
  end
end

# Model with pg_party
class Order < ApplicationRecord
  range_partition_by :order_date
  
  belongs_to :customer
end

# Creating partitions programmatically
Order.create_partition(
  name: :orders_2024_q3,
  start_range: '2024-07-01',
  end_range: '2024-10-01'
)

# List existing partitions
Order.partitions
# => [#<PgParty::Partition name=orders_2024_q1...>, ...]

Hash Partitioning Implementation

Hash partitioning distributes data evenly across a fixed number of partitions using a hash function on the partition key.

# Migration for hash-partitioned users table
class CreatePartitionedUsers < ActiveRecord::Migration[7.0]
  def up
    execute <<-SQL
      CREATE TABLE users (
        id BIGSERIAL,
        email VARCHAR(255) NOT NULL,
        username VARCHAR(100) NOT NULL,
        created_at TIMESTAMP NOT NULL
      ) PARTITION BY HASH (id);
    SQL
    
    # Create 4 hash partitions
    (0..3).each do |i|
      execute <<-SQL
        CREATE TABLE users_hash_#{i} PARTITION OF users
          FOR VALUES WITH (MODULUS 4, REMAINDER #{i});
      SQL
    end
    
    execute "CREATE INDEX idx_users_email ON users (email);"
  end
  
  def down
    execute "DROP TABLE IF EXISTS users CASCADE;"
  end
end

# Usage is transparent
User.create(email: 'user@example.com', username: 'user1')
# Automatically routed to appropriate partition based on hash(id)

User.where(id: 12345)
# Direct partition lookup using hash

List Partitioning for Multi-Tenancy

List partitioning isolates tenant data in multi-tenant applications.

# Migration for tenant-partitioned accounts
class CreatePartitionedAccounts < ActiveRecord::Migration[7.0]
  def up
    execute <<-SQL
      CREATE TABLE accounts (
        id BIGSERIAL,
        tenant_id INTEGER NOT NULL,
        name VARCHAR(255) NOT NULL,
        balance DECIMAL(15, 2),
        created_at TIMESTAMP NOT NULL
      ) PARTITION BY LIST (tenant_id);
    SQL
    
    # Create partition per tenant or tenant group
    execute <<-SQL
      CREATE TABLE accounts_tenant_1 PARTITION OF accounts
        FOR VALUES IN (1);
      
      CREATE TABLE accounts_tenant_2 PARTITION OF accounts
        FOR VALUES IN (2);
      
      CREATE TABLE accounts_tenant_group_a PARTITION OF accounts
        FOR VALUES IN (3, 4, 5);
    SQL
  end
end

# Model with tenant scoping
class Account < ApplicationRecord
  belongs_to :tenant
  
  default_scope { where(tenant_id: Current.tenant_id) }
end

# Queries automatically scoped to tenant partition
Current.tenant_id = 1
Account.all  # Only queries accounts_tenant_1 partition

Performance Considerations

Partitioning impacts query performance, maintenance operations, and storage efficiency. The benefits depend on partition strategy, query patterns, and table size.

Query Performance

Partition pruning delivers the primary performance benefit. Queries filtering by the partition key scan only relevant partitions. A query for January 2024 data on a monthly-partitioned table scans only the January partition, regardless of total table size. This optimization reduces I/O proportionally to the number of partitions pruned.

# Explain output shows partition pruning
Event.where(event_date: Date.new(2024, 1, 15)).explain

# Output (PostgreSQL):
# Seq Scan on events_2024_01
# Filter: (event_date = '2024-01-15')
# Partitions pruned: 11 of 12

Queries without partition key filters scan all partitions, performing worse than non-partitioned tables due to partition metadata overhead. Mixed workloads where some queries benefit from partition pruning while others scan all partitions may not gain net performance improvements.

Index performance improves on partitioned tables. Each partition maintains separate indexes, reducing index size proportionally. Smaller indexes fit in memory more readily and support faster lookups. Index maintenance operations complete faster on individual partitions.

Insert and Update Performance

Insert operations require partition routing overhead. The database evaluates partition key values and directs rows to appropriate partitions. This overhead remains minimal compared to overall insert cost. Batch inserts amortize routing overhead across multiple rows.

Updates changing partition key values require moving rows between partitions. The database deletes the row from the current partition and inserts it into the target partition. This operation costs significantly more than standard updates. Applications should minimize partition key updates.

# Expensive operation on partitioned table
event = Event.find(1)
event.update(event_date: Date.new(2025, 1, 1))
# Moves row from events_2024_01 to events_2025_01 partition

# Efficient operation
event.update(event_type: 'click')
# Updates in place within same partition

Maintenance Performance

Maintenance operations benefit substantially from partitioning. VACUUM, ANALYZE, and index rebuilds operate on individual partitions, completing faster and allowing granular scheduling.

# Vacuum specific partition during business hours
ActiveRecord::Base.connection.execute(
  "VACUUM ANALYZE events_2023_12"
)

# Full table vacuum would lock table longer
ActiveRecord::Base.connection.execute(
  "VACUUM ANALYZE events"  # Vacuums all partitions
)

Partition dropping provides instant archival. Dropping a partition removes data immediately without scanning rows or triggering delete cascades. This operation completes in milliseconds regardless of partition size.

# Archive old data by detaching partition
ActiveRecord::Base.connection.execute(<<-SQL
  ALTER TABLE events DETACH PARTITION events_2022_01;
SQL)

# Move to archive schema
ActiveRecord::Base.connection.execute(<<-SQL
  ALTER TABLE events_2022_01 SET SCHEMA archive;
SQL)

Partition Size Impact

Optimal partition size balances partition pruning benefits against partition metadata overhead. PostgreSQL query planner overhead increases with partition count. Tables with thousands of partitions may experience slower query planning.

Memory usage scales with partition count. Each partition requires catalog entries, statistics, and plan cache entries. Extremely large partition counts (10,000+) consume significant memory.

Monitoring partition access patterns identifies unused partitions and informs partition strategy adjustments. Partitions never accessed in queries waste resources and should be consolidated.

Parallel Query Execution

Modern databases execute queries across multiple partitions in parallel. PostgreSQL parallel sequential scans distribute partition scanning across worker processes, improving performance on multi-core systems.

# Query spanning multiple partitions executed in parallel
Event.where(event_date: Date.new(2024, 1, 1)..Date.new(2024, 3, 31))
  .group(:event_type)
  .count

# PostgreSQL may execute with parallel workers:
# Parallel workers: 4
# Partition 1 scanned by worker 1
# Partition 2 scanned by worker 2
# Partition 3 scanned by worker 3

Design Considerations

Partitioning introduces complexity and constraints that affect system design. The decision to partition requires analyzing workload patterns, growth projections, and operational requirements.

When to Partition

Tables exceeding 100GB in size become partition candidates. Query performance degrades as table size grows, particularly for range scans and aggregations. Maintenance operations take hours on multi-terabyte tables but minutes on individual partitions.

Write-heavy workloads with time-series data benefit most from partitioning. Log tables, event streams, transaction histories, and sensor data naturally partition by timestamp. Recent data receives most queries and updates while historical data remains read-only.

Multi-tenant applications with tenant-isolated queries achieve performance and isolation benefits through list partitioning by tenant. Tenant data isolation simplifies compliance requirements and enables per-tenant data retention policies.

Tables with skewed access patterns where queries typically access recent data gain partition pruning benefits. If 90% of queries access the most recent 10% of data, partitioning keeps that subset in memory.

When Not to Partition

Small tables under 10GB rarely benefit from partitioning. Partition metadata overhead and query planning complexity outweigh benefits. These tables fit in memory entirely, rendering partition pruning unnecessary.

Tables without clear partition keys prove difficult to partition effectively. If no column appears consistently in query WHERE clauses, partition pruning provides minimal benefit. Hash partitioning distributes data but offers limited query optimization.

Workloads requiring frequent partition key updates suffer performance penalties. Moving rows between partitions costs significantly more than standard updates. Tables where the partition key changes regularly perform poorly when partitioned.

Tables with extensive foreign key relationships to and from other tables complicate partitioning. Foreign keys from partitioned tables to non-partitioned tables function normally. Foreign keys from non-partitioned tables to partitioned tables require referencing the parent table, not individual partitions. Complex referential integrity constraints may preclude partitioning.

Partition Strategy Trade-offs

Range partitioning provides optimal query performance for time-series data but requires ongoing partition management. New partitions must be created for future time periods, and old partitions archived. Automated partition management reduces operational burden.

Hash partitioning requires no ongoing management after initial setup but offers limited query optimization. All queries scan all partitions unless filtering by exact partition key value. Hash partitioning suits workloads prioritizing even data distribution over query performance.

List partitioning provides good query performance for categorical data but requires knowing all possible values in advance. Adding new values requires creating new partitions. List partitioning works well for stable categorical dimensions like geographic regions.

Partition Granularity

Partition size represents a critical design decision. Monthly partitions suit most time-series applications, balancing partition count against partition size. High-volume applications may require daily or weekly partitions. Lower-volume applications may use quarterly or yearly partitions.

# Decision framework for partition granularity
case daily_row_count
when 0..100_000
  :yearly  # ~36M rows per partition
when 100_000..1_000_000
  :quarterly  # ~90M rows per partition
when 1_000_000..10_000_000
  :monthly  # ~300M rows per partition
when 10_000_000..Float::INFINITY
  :weekly  # ~70M rows per partition
end

Partition count impacts query planning overhead. PostgreSQL handles hundreds of partitions efficiently; thousands of partitions may slow query planning. Monitoring query plan time helps determine if partition count affects performance.

Schema Evolution

Adding columns to partitioned tables requires altering the parent table and all child partitions. Most databases propagate DDL changes automatically, but the operation takes longer than on non-partitioned tables.

Changing partition strategies requires rebuilding the table. Migrating from range to hash partitioning or changing partition keys involves creating a new partitioned table and migrating data. This operation requires careful planning and may necessitate downtime.

Backup and Recovery

Partitioning simplifies backup strategies. Individual partitions can be backed up independently. Historical partitions remain static after the time period closes, requiring backups only once. Recent active partitions require regular backups.

Point-in-time recovery operates on the entire table, not individual partitions. Partial recovery of specific partitions may require exporting and importing data.

Tools & Ecosystem

PostgreSQL Native Features

PostgreSQL 10+ provides declarative partitioning with range, list, and hash methods. PostgreSQL 11 added hash partitioning and improved query planner performance. PostgreSQL 12 enabled BEFORE INSERT triggers on partitioned tables. PostgreSQL 13 improved partition pruning performance.

The pg_partman extension automates partition management for PostgreSQL versions 9.6+. It provides background worker processes for creating and dropping partitions based on retention policies.

# Install pg_partman extension
ActiveRecord::Base.connection.execute("CREATE EXTENSION pg_partman;")

# Configure partition maintenance
ActiveRecord::Base.connection.execute(<<-SQL
  SELECT partman.create_parent(
    'public.events',
    'event_date',
    'native',
    'monthly',
    p_premake := 3,
    p_start_partition := '2024-01-01'
  );
SQL)

# Schedule maintenance
ActiveRecord::Base.connection.execute(<<-SQL
  INSERT INTO partman.part_config (
    parent_table,
    retention,
    retention_keep_table
  ) VALUES (
    'public.events',
    '24 months',
    true
  );
SQL)

Ruby Gems

The pg_party gem provides ActiveRecord integration for PostgreSQL partitioning. It adds migration DSL, model methods, and partition management utilities.

The activerecord-pg-partitions gem offers similar functionality with different API design. It focuses on migration helpers and SQL abstraction.

# activerecord-pg-partitions
gem 'activerecord-pg-partitions'

# In migration
class CreatePartitionedLogs < ActiveRecord::Migration[7.0]
  def change
    create_table :logs, partition_by: :range, partition_key: :created_at do |t|
      t.string :message
      t.string :level
      t.datetime :created_at
    end
    
    create_table_partition :logs,
      name: :logs_2024_01,
      values: {
        from: '2024-01-01',
        to: '2024-02-01'
      }
  end
end

MySQL Partitioning

MySQL supports partitioning in InnoDB storage engine. The syntax differs from PostgreSQL, using PARTITION BY clause in CREATE TABLE.

# MySQL partition creation
class CreatePartitionedEvents < ActiveRecord::Migration[7.0]
  def up
    execute <<-SQL
      CREATE TABLE events (
        id BIGINT AUTO_INCREMENT PRIMARY KEY,
        user_id BIGINT NOT NULL,
        event_date DATE NOT NULL,
        data JSON
      )
      PARTITION BY RANGE (YEAR(event_date)) (
        PARTITION p2023 VALUES LESS THAN (2024),
        PARTITION p2024 VALUES LESS THAN (2025),
        PARTITION p2025 VALUES LESS THAN (2026)
      );
    SQL
  end
end

MySQL partitioning has limitations compared to PostgreSQL. Foreign keys from partitioned tables to non-partitioned tables are not supported. All columns in the partition key must be part of every unique index.

Monitoring Tools

Database monitoring tools report partition-level metrics. PgHero provides partition size, query performance, and maintenance status for PostgreSQL applications.

# Gemfile
gem 'pghero'

# Check partition sizes
PgHero.database.table_stats
# => [
#   {name: "events_2024_01", rows: 5000000, size: "1.2 GB"},
#   {name: "events_2024_02", rows: 4800000, size: "1.1 GB"}
# ]

# Identify slow queries by partition
PgHero.database.slow_queries

Custom monitoring tracks partition creation lag, partition size growth, and query distribution across partitions.

class PartitionMonitor
  def check_partition_lag
    last_partition_date = fetch_last_partition_date('events')
    required_date = Date.today.end_of_month
    
    if last_partition_date < required_date
      alert("Partition lag detected for events table")
    end
  end
  
  def check_partition_sizes
    partition_sizes = fetch_partition_sizes('events')
    
    partition_sizes.each do |partition|
      if partition[:size_bytes] > 50.gigabytes
        alert("Partition #{partition[:name]} exceeds size threshold")
      end
    end
  end
  
  private
  
  def fetch_last_partition_date(table_name)
    sql = <<-SQL
      SELECT tablename FROM pg_tables
      WHERE schemaname = 'public'
        AND tablename LIKE '#{table_name}_%'
      ORDER BY tablename DESC
      LIMIT 1;
    SQL
    
    result = ActiveRecord::Base.connection.execute(sql)
    return nil if result.none?
    
    partition_name = result.first['tablename']
    extract_date_from_partition_name(partition_name)
  end
  
  def fetch_partition_sizes(table_name)
    sql = <<-SQL
      SELECT
        tablename as name,
        pg_total_relation_size(schemaname || '.' || tablename) as size_bytes
      FROM pg_tables
      WHERE schemaname = 'public'
        AND tablename LIKE '#{table_name}_%'
      ORDER BY tablename;
    SQL
    
    result = ActiveRecord::Base.connection.execute(sql)
    result.map { |row| row.symbolize_keys }
  end
end

Reference

Partition Type Comparison

Partition Type	Use Case	Key Distribution	Query Optimization	Management Overhead
Range	Time-series data, sequential IDs	Continuous ranges	Excellent for range queries	Requires periodic partition creation
List	Categorical data, multi-tenancy	Discrete values	Good for equality queries	Requires predefined value sets
Hash	Even distribution, no natural key	Uniform across partitions	Limited to equality queries	No ongoing management
Composite	Complex requirements	Hierarchical	Flexible optimization	Highest complexity

PostgreSQL Partition DDL Reference

Operation	SQL Command	Notes
Create partitioned table	CREATE TABLE ... PARTITION BY RANGE/LIST/HASH	Defines parent table structure
Create partition	CREATE TABLE ... PARTITION OF parent	Defines child partition bounds
Attach partition	ALTER TABLE parent ATTACH PARTITION child	Adds existing table as partition
Detach partition	ALTER TABLE parent DETACH PARTITION child	Removes partition from parent
Drop partition	DROP TABLE partition_name	Permanently deletes partition data

Partition Key Selection Criteria

Criterion	Importance	Evaluation
Query frequency	Critical	Column appears in most WHERE clauses
Data distribution	High	Values distribute evenly across partitions
Immutability	High	Values rarely or never change
Cardinality	Medium	Sufficient distinct values for partition count
Data lifecycle	Medium	Aligns with archival and retention policies

Ruby ActiveRecord Partition Queries

Query Pattern	Code Example	Partition Behavior
Date range filter	where(date: range)	Scans only partitions overlapping range
Equality filter	where(date: value)	Scans single partition containing value
No partition key	where(other_column: value)	Scans all partitions
Insert	create(attributes)	Routes to appropriate partition
Update partition key	update(date: new_date)	Moves row between partitions

Partition Maintenance Schedule

Task	Frequency	Purpose
Create future partitions	Weekly	Ensure partitions exist for incoming data
Drop old partitions	Monthly	Enforce retention policies
Vacuum partitions	Daily	Reclaim space, update statistics
Analyze partitions	After bulk loads	Update query planner statistics
Monitor partition sizes	Daily	Detect unusual growth patterns
Reindex partitions	Quarterly	Reduce index bloat

Common Partition Configurations

Workload Type	Recommended Strategy	Partition Key	Partition Interval
Event logs	Range	timestamp	Monthly
User data	Hash	user_id	8-16 partitions
Multi-tenant SaaS	List	tenant_id	Per tenant or tenant group
Financial transactions	Range	transaction_date	Monthly or quarterly
IoT sensor data	Range + Hash	timestamp + device_id	Daily + hash sub-partition
Geographic data	List	country_code or region	Per country or region group

Performance Impact Factors

Factor	Impact on Query	Impact on Writes	Impact on Maintenance
Partition count	High with many partitions	Low	Proportional to count
Partition size	Low until memory limit	Low	Proportional to size
Partition key in query	Major improvement	None	None
Partition key updates	None	Major overhead	None
Index per partition	Faster lookups	Slower inserts	Faster rebuilds

Partition Troubleshooting

Issue	Symptom	Solution
Slow queries	All partitions scanned	Add partition key to WHERE clause
Insert failures	No matching partition	Create partition or enable default partition
High disk usage	Old partitions retained	Implement partition dropping policy
Slow partition creation	Table locked during creation	Use CONCURRENTLY or create during low traffic
Uneven partition sizes	Some partitions much larger	Adjust partition boundaries or switch to hash
Foreign key errors	Cannot create FK to partition	Reference parent table instead

Data Partitioning