Overview
Data partitioning divides a large database table into smaller, more manageable pieces called partitions while maintaining the logical structure of a single table. Each partition stores a subset of the data based on defined criteria, allowing database systems to query only relevant partitions rather than scanning entire tables. This technique addresses performance degradation that occurs as tables grow to millions or billions of rows.
Partitioning operates transparently to applications in most implementations. Queries against a partitioned table function identically to queries against a non-partitioned table, with the database optimizer automatically determining which partitions to access. This abstraction allows adding partitioning to existing systems without modifying application code.
The technique originated in large-scale data warehouse systems where analytical queries scanned massive fact tables. Early implementations appeared in enterprise database systems like Oracle and DB2 in the 1990s. PostgreSQL added native table partitioning in version 10 (2017), while MySQL has supported partitioning since version 5.1 (2008). Modern distributed databases like Cassandra and MongoDB implement partitioning as a fundamental architecture component.
# Logical view - single table interface
Customer.where(region: 'US').count
# Physical storage - data distributed across multiple partitions
# customers_us (partition 1)
# customers_eu (partition 2)
# customers_asia (partition 3)
Partitioning differs from sharding in scope and purpose. Partitioning divides a table within a single database instance, maintaining ACID guarantees and foreign key relationships. Sharding distributes data across multiple database servers, requiring application-level coordination and sacrificing some transactional guarantees for horizontal scalability.
The primary benefits include improved query performance through partition pruning, where the database scans only relevant partitions; faster maintenance operations by working on individual partitions; and better resource management by archiving or compressing old partitions. Systems with time-series data, multi-tenant architectures, or geographical data distribution gain substantial advantages from partitioning.
Key Principles
Data partitioning operates on the principle of divide-and-conquer. The partition key determines how rows distribute across partitions. This key can be a single column or multiple columns, and the database uses it to route data to specific partitions during inserts and to identify relevant partitions during queries.
Partition Types
Range partitioning divides data based on value ranges. A table partitioned by date might have monthly partitions, with all January 2024 records in one partition and February 2024 records in another. Range partitioning suits time-series data, sequential identifiers, and any naturally ordered data where queries typically filter by ranges.
# Range partitioning by date
# Partition 1: created_at >= '2024-01-01' AND created_at < '2024-02-01'
# Partition 2: created_at >= '2024-02-01' AND created_at < '2024-03-01'
# Partition 3: created_at >= '2024-03-01' AND created_at < '2024-04-01'
List partitioning assigns specific discrete values to each partition. A multi-tenant application might partition by tenant_id, with specific tenants grouped into partitions. List partitioning works well for categorical data with known, finite values like country codes, product categories, or status codes.
Hash partitioning applies a hash function to the partition key, distributing data evenly across a fixed number of partitions. This method achieves balanced data distribution when no natural partitioning scheme exists. Hash partitioning prevents hotspots but limits query optimization since the database cannot skip partitions based on value ranges.
# Hash partitioning distributes data evenly
# hash(user_id) % 4 determines partition:
# Partition 0: user_ids where hash(user_id) % 4 == 0
# Partition 1: user_ids where hash(user_id) % 4 == 1
# Partition 2: user_ids where hash(user_id) % 4 == 2
# Partition 3: user_ids where hash(user_id) % 4 == 3
Composite partitioning combines multiple methods. A table might use range partitioning by date and sub-partition with hash on user_id, creating a hierarchy where each date range splits into multiple hash-based partitions. This approach addresses complex distribution requirements.
Partition Pruning
Partition pruning represents the core optimization that makes partitioning valuable. When executing a query with a WHERE clause that references the partition key, the database examines the partition definitions and eliminates irrelevant partitions from consideration. A query for records from March 2024 on a date-partitioned table accesses only the March partition, ignoring all others.
The effectiveness of partition pruning depends on query patterns. Queries filtering by the partition key achieve optimal performance. Queries without partition key filters perform full table scans across all partitions, negating partitioning benefits. Analyzing query workloads determines appropriate partition keys.
Partition Management
Partitions require ongoing maintenance. Creating new partitions for upcoming time periods, dropping old partitions to archive data, and rebalancing partitions as data distribution changes constitute regular operations. Automated partition management through scripts or database features reduces operational burden.
Each partition maintains its own indexes, statistics, and physical storage. Maintenance operations like VACUUM in PostgreSQL or OPTIMIZE TABLE in MySQL operate on individual partitions, completing faster than operating on the entire table. This granularity allows maintenance during business hours without locking the entire table.
Implementation Approaches
Horizontal vs Vertical Partitioning
Horizontal partitioning divides rows across partitions, with each partition containing a subset of rows but all columns. This standard partitioning approach addresses the most common scaling challenge: too many rows. Vertical partitioning divides columns across partitions, with each partition containing all rows but only specific columns. Vertical partitioning optimizes for tables with many columns where queries typically access only subsets of columns.
# Horizontal partitioning - subset of rows
# Partition 1: rows where year = 2023
# Partition 2: rows where year = 2024
# Vertical partitioning - subset of columns
# Partition 1: id, name, email
# Partition 2: id, large_text_field, blob_data
Partition Strategy Selection
Range partitioning excels for time-series data. Event logs, transaction records, sensor readings, and audit trails naturally partition by timestamp. Queries typically filter by time ranges, accessing recent data more frequently than historical data. Old partitions can be archived or compressed without affecting current operations.
List partitioning suits data with clear categorical boundaries. Multi-tenant SaaS applications partition by tenant, isolating tenant data and enabling per-tenant operations. Geographic applications partition by region or country. The key requirement is a relatively small number of discrete values with predictable distribution.
Hash partitioning distributes data evenly when no natural partitioning boundary exists. User tables often use hash partitioning on user_id to achieve balanced distribution. Hash partitioning prevents hotspots but limits partition pruning to equality comparisons. Range queries on hash-partitioned tables scan all partitions.
Partition Key Selection
The partition key determines partitioning effectiveness. Ideal partition keys appear in most query WHERE clauses, distribute data evenly across partitions, and remain immutable. Changing a row's partition key requires moving it between partitions, a costly operation.
Time-based keys work well for append-mostly data where updates occur on recent records. User or tenant IDs work for multi-tenant systems with tenant-isolated queries. Composite keys combining time and category handle complex access patterns.
# Good partition key - appears in most queries, evenly distributed
class Event < ApplicationRecord
# Partitioned by event_date
# Most queries: Event.where(event_date: date_range)
end
# Poor partition key - skewed distribution
class Order < ApplicationRecord
# Partitioned by status (90% are 'completed')
# Creates massive 'completed' partition
end
Partition Sizing
Partition size balances partition pruning benefits against partition management overhead. Too many small partitions increase metadata overhead and complicate maintenance. Too few large partitions reduce partition pruning effectiveness.
Common sizing strategies include monthly partitions for high-volume tables, weekly partitions for extremely high-volume tables, and yearly partitions for moderate-volume tables. The target size depends on database capabilities, query patterns, and maintenance requirements. PostgreSQL handles hundreds of partitions efficiently; thousands of partitions may impact performance.
Partition Creation Strategies
Pre-creating future partitions prevents insertion failures. Scripts or database jobs create partitions for upcoming time periods. A common pattern creates partitions three months in advance and drops partitions older than two years.
Dynamic partition creation automatically generates partitions as needed. Some databases support automatic partition creation when inserting data without a matching partition. This approach reduces operational overhead but requires error handling for partition creation failures.
Ruby Implementation
Ruby applications interact with partitioned tables through ActiveRecord with database-specific extensions or specialized gems. PostgreSQL offers the most mature partitioning support with native features and robust gem ecosystems.
PostgreSQL Partitioning with ActiveRecord
PostgreSQL 10+ supports declarative partitioning. Creating partitioned tables requires SQL since ActiveRecord migrations lack native partitioning syntax. The table definition specifies the partition method and key, with child partitions created separately.
# Migration creating range-partitioned table
class CreatePartitionedEvents < ActiveRecord::Migration[7.0]
def up
execute <<-SQL
CREATE TABLE events (
id BIGSERIAL,
user_id BIGINT NOT NULL,
event_type VARCHAR(50) NOT NULL,
event_date DATE NOT NULL,
payload JSONB,
created_at TIMESTAMP NOT NULL
) PARTITION BY RANGE (event_date);
SQL
# Create partitions for each month
execute <<-SQL
CREATE TABLE events_2024_01 PARTITION OF events
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
CREATE TABLE events_2024_02 PARTITION OF events
FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
CREATE TABLE events_2024_03 PARTITION OF events
FOR VALUES FROM ('2024-03-01') TO ('2024-04-01');
SQL
# Indexes created on parent table apply to all partitions
execute "CREATE INDEX idx_events_user_id ON events (user_id);"
execute "CREATE INDEX idx_events_type ON events (event_type);"
end
def down
execute "DROP TABLE IF EXISTS events CASCADE;"
end
end
ActiveRecord models interact with partitioned tables transparently. The application references the parent table, and PostgreSQL routes queries to appropriate partitions.
class Event < ApplicationRecord
belongs_to :user
scope :for_date_range, ->(start_date, end_date) {
where(event_date: start_date..end_date)
}
scope :recent, -> { where('event_date >= ?', 30.days.ago) }
end
# Queries automatically benefit from partition pruning
Event.where(event_date: Date.new(2024, 1, 15))
# Only scans events_2024_01 partition
Event.for_date_range(Date.new(2024, 1, 15), Date.new(2024, 2, 15))
# Only scans events_2024_01 and events_2024_02 partitions
Automated Partition Management
Managing partitions manually becomes tedious. Ruby scripts automate partition creation and deletion based on retention policies.
# Partition manager for time-based partitions
class PartitionManager
def initialize(table_name, partition_key, interval: :monthly)
@table_name = table_name
@partition_key = partition_key
@interval = interval
end
def create_future_partitions(months_ahead: 3)
(0...months_ahead).each do |offset|
date = Date.today.beginning_of_month + offset.months
create_partition(date)
end
end
def drop_old_partitions(retention_months: 24)
cutoff_date = Date.today.beginning_of_month - retention_months.months
partition_names = fetch_partition_names
partition_names.each do |partition_name|
partition_date = extract_date_from_partition_name(partition_name)
next if partition_date >= cutoff_date
drop_partition(partition_name)
end
end
private
def create_partition(date)
partition_name = "#{@table_name}_#{date.strftime('%Y_%m')}"
start_date = date.beginning_of_month
end_date = date.end_of_month + 1.day
sql = <<-SQL
CREATE TABLE IF NOT EXISTS #{partition_name}
PARTITION OF #{@table_name}
FOR VALUES FROM ('#{start_date}') TO ('#{end_date}');
SQL
ActiveRecord::Base.connection.execute(sql)
Rails.logger.info("Created partition: #{partition_name}")
rescue ActiveRecord::StatementInvalid => e
Rails.logger.error("Failed to create partition #{partition_name}: #{e.message}")
end
def drop_partition(partition_name)
sql = "DROP TABLE IF EXISTS #{partition_name};"
ActiveRecord::Base.connection.execute(sql)
Rails.logger.info("Dropped partition: #{partition_name}")
end
def fetch_partition_names
sql = <<-SQL
SELECT tablename FROM pg_tables
WHERE schemaname = 'public'
AND tablename LIKE '#{@table_name}_%'
ORDER BY tablename;
SQL
result = ActiveRecord::Base.connection.execute(sql)
result.map { |row| row['tablename'] }
end
def extract_date_from_partition_name(name)
# Extract date from pattern: table_name_2024_01
match = name.match(/(\d{4})_(\d{2})$/)
return nil unless match
Date.new(match[1].to_i, match[2].to_i, 1)
end
end
# Usage in a scheduled job
class PartitionMaintenanceJob < ApplicationJob
queue_as :default
def perform
manager = PartitionManager.new('events', 'event_date')
manager.create_future_partitions(months_ahead: 6)
manager.drop_old_partitions(retention_months: 24)
end
end
PgParty Gem
The pg_party gem provides ActiveRecord DSL for partition management. It abstracts partition creation syntax and adds migration helpers.
# Gemfile
gem 'pg_party'
# Migration using pg_party
class CreatePartitionedOrders < ActiveRecord::Migration[7.0]
def up
create_range_partition :orders, partition_key: :order_date do |t|
t.bigint :customer_id, null: false
t.decimal :total_amount, precision: 10, scale: 2
t.string :status
t.date :order_date, null: false
t.timestamps
end
create_range_partition_of :orders,
name: :orders_2024_q1,
start_range: '2024-01-01',
end_range: '2024-04-01'
create_range_partition_of :orders,
name: :orders_2024_q2,
start_range: '2024-04-01',
end_range: '2024-07-01'
end
def down
drop_table :orders
end
end
# Model with pg_party
class Order < ApplicationRecord
range_partition_by :order_date
belongs_to :customer
end
# Creating partitions programmatically
Order.create_partition(
name: :orders_2024_q3,
start_range: '2024-07-01',
end_range: '2024-10-01'
)
# List existing partitions
Order.partitions
# => [#<PgParty::Partition name=orders_2024_q1...>, ...]
Hash Partitioning Implementation
Hash partitioning distributes data evenly across a fixed number of partitions using a hash function on the partition key.
# Migration for hash-partitioned users table
class CreatePartitionedUsers < ActiveRecord::Migration[7.0]
def up
execute <<-SQL
CREATE TABLE users (
id BIGSERIAL,
email VARCHAR(255) NOT NULL,
username VARCHAR(100) NOT NULL,
created_at TIMESTAMP NOT NULL
) PARTITION BY HASH (id);
SQL
# Create 4 hash partitions
(0..3).each do |i|
execute <<-SQL
CREATE TABLE users_hash_#{i} PARTITION OF users
FOR VALUES WITH (MODULUS 4, REMAINDER #{i});
SQL
end
execute "CREATE INDEX idx_users_email ON users (email);"
end
def down
execute "DROP TABLE IF EXISTS users CASCADE;"
end
end
# Usage is transparent
User.create(email: 'user@example.com', username: 'user1')
# Automatically routed to appropriate partition based on hash(id)
User.where(id: 12345)
# Direct partition lookup using hash
List Partitioning for Multi-Tenancy
List partitioning isolates tenant data in multi-tenant applications.
# Migration for tenant-partitioned accounts
class CreatePartitionedAccounts < ActiveRecord::Migration[7.0]
def up
execute <<-SQL
CREATE TABLE accounts (
id BIGSERIAL,
tenant_id INTEGER NOT NULL,
name VARCHAR(255) NOT NULL,
balance DECIMAL(15, 2),
created_at TIMESTAMP NOT NULL
) PARTITION BY LIST (tenant_id);
SQL
# Create partition per tenant or tenant group
execute <<-SQL
CREATE TABLE accounts_tenant_1 PARTITION OF accounts
FOR VALUES IN (1);
CREATE TABLE accounts_tenant_2 PARTITION OF accounts
FOR VALUES IN (2);
CREATE TABLE accounts_tenant_group_a PARTITION OF accounts
FOR VALUES IN (3, 4, 5);
SQL
end
end
# Model with tenant scoping
class Account < ApplicationRecord
belongs_to :tenant
default_scope { where(tenant_id: Current.tenant_id) }
end
# Queries automatically scoped to tenant partition
Current.tenant_id = 1
Account.all # Only queries accounts_tenant_1 partition
Performance Considerations
Partitioning impacts query performance, maintenance operations, and storage efficiency. The benefits depend on partition strategy, query patterns, and table size.
Query Performance
Partition pruning delivers the primary performance benefit. Queries filtering by the partition key scan only relevant partitions. A query for January 2024 data on a monthly-partitioned table scans only the January partition, regardless of total table size. This optimization reduces I/O proportionally to the number of partitions pruned.
# Explain output shows partition pruning
Event.where(event_date: Date.new(2024, 1, 15)).explain
# Output (PostgreSQL):
# Seq Scan on events_2024_01
# Filter: (event_date = '2024-01-15')
# Partitions pruned: 11 of 12
Queries without partition key filters scan all partitions, performing worse than non-partitioned tables due to partition metadata overhead. Mixed workloads where some queries benefit from partition pruning while others scan all partitions may not gain net performance improvements.
Index performance improves on partitioned tables. Each partition maintains separate indexes, reducing index size proportionally. Smaller indexes fit in memory more readily and support faster lookups. Index maintenance operations complete faster on individual partitions.
Insert and Update Performance
Insert operations require partition routing overhead. The database evaluates partition key values and directs rows to appropriate partitions. This overhead remains minimal compared to overall insert cost. Batch inserts amortize routing overhead across multiple rows.
Updates changing partition key values require moving rows between partitions. The database deletes the row from the current partition and inserts it into the target partition. This operation costs significantly more than standard updates. Applications should minimize partition key updates.
# Expensive operation on partitioned table
event = Event.find(1)
event.update(event_date: Date.new(2025, 1, 1))
# Moves row from events_2024_01 to events_2025_01 partition
# Efficient operation
event.update(event_type: 'click')
# Updates in place within same partition
Maintenance Performance
Maintenance operations benefit substantially from partitioning. VACUUM, ANALYZE, and index rebuilds operate on individual partitions, completing faster and allowing granular scheduling.
# Vacuum specific partition during business hours
ActiveRecord::Base.connection.execute(
"VACUUM ANALYZE events_2023_12"
)
# Full table vacuum would lock table longer
ActiveRecord::Base.connection.execute(
"VACUUM ANALYZE events" # Vacuums all partitions
)
Partition dropping provides instant archival. Dropping a partition removes data immediately without scanning rows or triggering delete cascades. This operation completes in milliseconds regardless of partition size.
# Archive old data by detaching partition
ActiveRecord::Base.connection.execute(<<-SQL
ALTER TABLE events DETACH PARTITION events_2022_01;
SQL)
# Move to archive schema
ActiveRecord::Base.connection.execute(<<-SQL
ALTER TABLE events_2022_01 SET SCHEMA archive;
SQL)
Partition Size Impact
Optimal partition size balances partition pruning benefits against partition metadata overhead. PostgreSQL query planner overhead increases with partition count. Tables with thousands of partitions may experience slower query planning.
Memory usage scales with partition count. Each partition requires catalog entries, statistics, and plan cache entries. Extremely large partition counts (10,000+) consume significant memory.
Monitoring partition access patterns identifies unused partitions and informs partition strategy adjustments. Partitions never accessed in queries waste resources and should be consolidated.
Parallel Query Execution
Modern databases execute queries across multiple partitions in parallel. PostgreSQL parallel sequential scans distribute partition scanning across worker processes, improving performance on multi-core systems.
# Query spanning multiple partitions executed in parallel
Event.where(event_date: Date.new(2024, 1, 1)..Date.new(2024, 3, 31))
.group(:event_type)
.count
# PostgreSQL may execute with parallel workers:
# Parallel workers: 4
# Partition 1 scanned by worker 1
# Partition 2 scanned by worker 2
# Partition 3 scanned by worker 3
Design Considerations
Partitioning introduces complexity and constraints that affect system design. The decision to partition requires analyzing workload patterns, growth projections, and operational requirements.
When to Partition
Tables exceeding 100GB in size become partition candidates. Query performance degrades as table size grows, particularly for range scans and aggregations. Maintenance operations take hours on multi-terabyte tables but minutes on individual partitions.
Write-heavy workloads with time-series data benefit most from partitioning. Log tables, event streams, transaction histories, and sensor data naturally partition by timestamp. Recent data receives most queries and updates while historical data remains read-only.
Multi-tenant applications with tenant-isolated queries achieve performance and isolation benefits through list partitioning by tenant. Tenant data isolation simplifies compliance requirements and enables per-tenant data retention policies.
Tables with skewed access patterns where queries typically access recent data gain partition pruning benefits. If 90% of queries access the most recent 10% of data, partitioning keeps that subset in memory.
When Not to Partition
Small tables under 10GB rarely benefit from partitioning. Partition metadata overhead and query planning complexity outweigh benefits. These tables fit in memory entirely, rendering partition pruning unnecessary.
Tables without clear partition keys prove difficult to partition effectively. If no column appears consistently in query WHERE clauses, partition pruning provides minimal benefit. Hash partitioning distributes data but offers limited query optimization.
Workloads requiring frequent partition key updates suffer performance penalties. Moving rows between partitions costs significantly more than standard updates. Tables where the partition key changes regularly perform poorly when partitioned.
Tables with extensive foreign key relationships to and from other tables complicate partitioning. Foreign keys from partitioned tables to non-partitioned tables function normally. Foreign keys from non-partitioned tables to partitioned tables require referencing the parent table, not individual partitions. Complex referential integrity constraints may preclude partitioning.
Partition Strategy Trade-offs
Range partitioning provides optimal query performance for time-series data but requires ongoing partition management. New partitions must be created for future time periods, and old partitions archived. Automated partition management reduces operational burden.
Hash partitioning requires no ongoing management after initial setup but offers limited query optimization. All queries scan all partitions unless filtering by exact partition key value. Hash partitioning suits workloads prioritizing even data distribution over query performance.
List partitioning provides good query performance for categorical data but requires knowing all possible values in advance. Adding new values requires creating new partitions. List partitioning works well for stable categorical dimensions like geographic regions.
Partition Granularity
Partition size represents a critical design decision. Monthly partitions suit most time-series applications, balancing partition count against partition size. High-volume applications may require daily or weekly partitions. Lower-volume applications may use quarterly or yearly partitions.
# Decision framework for partition granularity
case daily_row_count
when 0..100_000
:yearly # ~36M rows per partition
when 100_000..1_000_000
:quarterly # ~90M rows per partition
when 1_000_000..10_000_000
:monthly # ~300M rows per partition
when 10_000_000..Float::INFINITY
:weekly # ~70M rows per partition
end
Partition count impacts query planning overhead. PostgreSQL handles hundreds of partitions efficiently; thousands of partitions may slow query planning. Monitoring query plan time helps determine if partition count affects performance.
Schema Evolution
Adding columns to partitioned tables requires altering the parent table and all child partitions. Most databases propagate DDL changes automatically, but the operation takes longer than on non-partitioned tables.
Changing partition strategies requires rebuilding the table. Migrating from range to hash partitioning or changing partition keys involves creating a new partitioned table and migrating data. This operation requires careful planning and may necessitate downtime.
Backup and Recovery
Partitioning simplifies backup strategies. Individual partitions can be backed up independently. Historical partitions remain static after the time period closes, requiring backups only once. Recent active partitions require regular backups.
Point-in-time recovery operates on the entire table, not individual partitions. Partial recovery of specific partitions may require exporting and importing data.
Tools & Ecosystem
PostgreSQL Native Features
PostgreSQL 10+ provides declarative partitioning with range, list, and hash methods. PostgreSQL 11 added hash partitioning and improved query planner performance. PostgreSQL 12 enabled BEFORE INSERT triggers on partitioned tables. PostgreSQL 13 improved partition pruning performance.
The pg_partman extension automates partition management for PostgreSQL versions 9.6+. It provides background worker processes for creating and dropping partitions based on retention policies.
# Install pg_partman extension
ActiveRecord::Base.connection.execute("CREATE EXTENSION pg_partman;")
# Configure partition maintenance
ActiveRecord::Base.connection.execute(<<-SQL
SELECT partman.create_parent(
'public.events',
'event_date',
'native',
'monthly',
p_premake := 3,
p_start_partition := '2024-01-01'
);
SQL)
# Schedule maintenance
ActiveRecord::Base.connection.execute(<<-SQL
INSERT INTO partman.part_config (
parent_table,
retention,
retention_keep_table
) VALUES (
'public.events',
'24 months',
true
);
SQL)
Ruby Gems
The pg_party gem provides ActiveRecord integration for PostgreSQL partitioning. It adds migration DSL, model methods, and partition management utilities.
The activerecord-pg-partitions gem offers similar functionality with different API design. It focuses on migration helpers and SQL abstraction.
# activerecord-pg-partitions
gem 'activerecord-pg-partitions'
# In migration
class CreatePartitionedLogs < ActiveRecord::Migration[7.0]
def change
create_table :logs, partition_by: :range, partition_key: :created_at do |t|
t.string :message
t.string :level
t.datetime :created_at
end
create_table_partition :logs,
name: :logs_2024_01,
values: {
from: '2024-01-01',
to: '2024-02-01'
}
end
end
MySQL Partitioning
MySQL supports partitioning in InnoDB storage engine. The syntax differs from PostgreSQL, using PARTITION BY clause in CREATE TABLE.
# MySQL partition creation
class CreatePartitionedEvents < ActiveRecord::Migration[7.0]
def up
execute <<-SQL
CREATE TABLE events (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
user_id BIGINT NOT NULL,
event_date DATE NOT NULL,
data JSON
)
PARTITION BY RANGE (YEAR(event_date)) (
PARTITION p2023 VALUES LESS THAN (2024),
PARTITION p2024 VALUES LESS THAN (2025),
PARTITION p2025 VALUES LESS THAN (2026)
);
SQL
end
end
MySQL partitioning has limitations compared to PostgreSQL. Foreign keys from partitioned tables to non-partitioned tables are not supported. All columns in the partition key must be part of every unique index.
Monitoring Tools
Database monitoring tools report partition-level metrics. PgHero provides partition size, query performance, and maintenance status for PostgreSQL applications.
# Gemfile
gem 'pghero'
# Check partition sizes
PgHero.database.table_stats
# => [
# {name: "events_2024_01", rows: 5000000, size: "1.2 GB"},
# {name: "events_2024_02", rows: 4800000, size: "1.1 GB"}
# ]
# Identify slow queries by partition
PgHero.database.slow_queries
Custom monitoring tracks partition creation lag, partition size growth, and query distribution across partitions.
class PartitionMonitor
def check_partition_lag
last_partition_date = fetch_last_partition_date('events')
required_date = Date.today.end_of_month
if last_partition_date < required_date
alert("Partition lag detected for events table")
end
end
def check_partition_sizes
partition_sizes = fetch_partition_sizes('events')
partition_sizes.each do |partition|
if partition[:size_bytes] > 50.gigabytes
alert("Partition #{partition[:name]} exceeds size threshold")
end
end
end
private
def fetch_last_partition_date(table_name)
sql = <<-SQL
SELECT tablename FROM pg_tables
WHERE schemaname = 'public'
AND tablename LIKE '#{table_name}_%'
ORDER BY tablename DESC
LIMIT 1;
SQL
result = ActiveRecord::Base.connection.execute(sql)
return nil if result.none?
partition_name = result.first['tablename']
extract_date_from_partition_name(partition_name)
end
def fetch_partition_sizes(table_name)
sql = <<-SQL
SELECT
tablename as name,
pg_total_relation_size(schemaname || '.' || tablename) as size_bytes
FROM pg_tables
WHERE schemaname = 'public'
AND tablename LIKE '#{table_name}_%'
ORDER BY tablename;
SQL
result = ActiveRecord::Base.connection.execute(sql)
result.map { |row| row.symbolize_keys }
end
end
Reference
Partition Type Comparison
| Partition Type | Use Case | Key Distribution | Query Optimization | Management Overhead |
|---|---|---|---|---|
| Range | Time-series data, sequential IDs | Continuous ranges | Excellent for range queries | Requires periodic partition creation |
| List | Categorical data, multi-tenancy | Discrete values | Good for equality queries | Requires predefined value sets |
| Hash | Even distribution, no natural key | Uniform across partitions | Limited to equality queries | No ongoing management |
| Composite | Complex requirements | Hierarchical | Flexible optimization | Highest complexity |
PostgreSQL Partition DDL Reference
| Operation | SQL Command | Notes |
|---|---|---|
| Create partitioned table | CREATE TABLE ... PARTITION BY RANGE/LIST/HASH | Defines parent table structure |
| Create partition | CREATE TABLE ... PARTITION OF parent | Defines child partition bounds |
| Attach partition | ALTER TABLE parent ATTACH PARTITION child | Adds existing table as partition |
| Detach partition | ALTER TABLE parent DETACH PARTITION child | Removes partition from parent |
| Drop partition | DROP TABLE partition_name | Permanently deletes partition data |
Partition Key Selection Criteria
| Criterion | Importance | Evaluation |
|---|---|---|
| Query frequency | Critical | Column appears in most WHERE clauses |
| Data distribution | High | Values distribute evenly across partitions |
| Immutability | High | Values rarely or never change |
| Cardinality | Medium | Sufficient distinct values for partition count |
| Data lifecycle | Medium | Aligns with archival and retention policies |
Ruby ActiveRecord Partition Queries
| Query Pattern | Code Example | Partition Behavior |
|---|---|---|
| Date range filter | where(date: range) | Scans only partitions overlapping range |
| Equality filter | where(date: value) | Scans single partition containing value |
| No partition key | where(other_column: value) | Scans all partitions |
| Insert | create(attributes) | Routes to appropriate partition |
| Update partition key | update(date: new_date) | Moves row between partitions |
Partition Maintenance Schedule
| Task | Frequency | Purpose |
|---|---|---|
| Create future partitions | Weekly | Ensure partitions exist for incoming data |
| Drop old partitions | Monthly | Enforce retention policies |
| Vacuum partitions | Daily | Reclaim space, update statistics |
| Analyze partitions | After bulk loads | Update query planner statistics |
| Monitor partition sizes | Daily | Detect unusual growth patterns |
| Reindex partitions | Quarterly | Reduce index bloat |
Common Partition Configurations
| Workload Type | Recommended Strategy | Partition Key | Partition Interval |
|---|---|---|---|
| Event logs | Range | timestamp | Monthly |
| User data | Hash | user_id | 8-16 partitions |
| Multi-tenant SaaS | List | tenant_id | Per tenant or tenant group |
| Financial transactions | Range | transaction_date | Monthly or quarterly |
| IoT sensor data | Range + Hash | timestamp + device_id | Daily + hash sub-partition |
| Geographic data | List | country_code or region | Per country or region group |
Performance Impact Factors
| Factor | Impact on Query | Impact on Writes | Impact on Maintenance |
|---|---|---|---|
| Partition count | High with many partitions | Low | Proportional to count |
| Partition size | Low until memory limit | Low | Proportional to size |
| Partition key in query | Major improvement | None | None |
| Partition key updates | None | Major overhead | None |
| Index per partition | Faster lookups | Slower inserts | Faster rebuilds |
Partition Troubleshooting
| Issue | Symptom | Solution |
|---|---|---|
| Slow queries | All partitions scanned | Add partition key to WHERE clause |
| Insert failures | No matching partition | Create partition or enable default partition |
| High disk usage | Old partitions retained | Implement partition dropping policy |
| Slow partition creation | Table locked during creation | Use CONCURRENTLY or create during low traffic |
| Uneven partition sizes | Some partitions much larger | Adjust partition boundaries or switch to hash |
| Foreign key errors | Cannot create FK to partition | Reference parent table instead |