CrackedRuby - Window Functions

Overview

Window functions operate on a set of rows and return a single value for each row from the underlying query. Unlike aggregate functions that collapse rows into groups, window functions maintain the original row structure while performing calculations across a defined window of rows. The window defines which rows to include in the calculation relative to the current row.

A window function call includes the function name, arguments, and an OVER clause that defines the window specification. The OVER clause controls how rows are partitioned, ordered, and framed for the calculation. Window functions appear in the SELECT list or ORDER BY clause but never in WHERE, GROUP BY, or HAVING clauses because they execute after these clauses.

The SQL standard defines three categories of window functions: aggregate functions (SUM, AVG, COUNT), ranking functions (ROW_NUMBER, RANK, DENSE_RANK), and value functions (LAG, LEAD, FIRST_VALUE, LAST_VALUE). Databases execute window functions after joins, filtering, and grouping but before final ordering and limiting.

SELECT 
  employee_id,
  department,
  salary,
  AVG(salary) OVER (PARTITION BY department) AS dept_avg_salary,
  RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS salary_rank
FROM employees;

This query calculates both the department average salary and the salary rank for each employee without collapsing rows. Each employee row retains its individual data while gaining computed values based on related rows.

Key Principles

The OVER clause defines three components that control window behavior: partitioning, ordering, and framing. Partitioning divides the result set into groups using PARTITION BY, similar to GROUP BY but without collapsing rows. Each partition processes independently with its own window calculations. Without PARTITION BY, the entire result set forms a single partition.

Ordering within a window uses ORDER BY inside the OVER clause to determine row sequence for calculations. This ordering differs from the query's final ORDER BY clause. Ranking functions require ordering while some aggregate functions produce different results based on row order. Functions like SUM calculate cumulative values when rows have a defined sequence.

Frame specifications define the precise row range for calculations within the partition. The frame moves relative to the current row and includes three components: frame units (ROWS or RANGE), frame start, and frame end. ROWS counts physical rows while RANGE considers logical ranges based on ORDER BY values. The default frame specification depends on whether ORDER BY exists in the OVER clause.

Without ORDER BY, the default frame includes all partition rows. With ORDER BY but no explicit frame, the default frame spans from partition start to the current row (RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW). This default enables cumulative calculations like running totals.

Frame bounds use several keywords: UNBOUNDED PRECEDING (partition start), UNBOUNDED FOLLOWING (partition end), CURRENT ROW, and n PRECEDING/FOLLOWING for relative positions. The frame must start before or at its end position, and both bounds must be valid within partition boundaries.

-- Different frame specifications
SUM(amount) OVER (ORDER BY date 
                  ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)  -- Last 3 rows

SUM(amount) OVER (ORDER BY date 
                  RANGE BETWEEN INTERVAL '7 days' PRECEDING AND CURRENT ROW)  -- Last 7 days

SUM(amount) OVER (ORDER BY date 
                  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)  -- Running total

The difference between ROWS and RANGE matters when ORDER BY values contain duplicates. ROWS counts physical rows regardless of duplicates. RANGE groups rows with identical ORDER BY values, treating them as a single unit. For tied values, RANGE includes all rows with the same value even if they exceed the specified offset.

Window functions cannot nest. An expression like RANK() OVER (ORDER BY SUM(x) OVER (...)) is invalid. Nesting requires subqueries or CTEs to compute inner window functions first, then reference those results in outer window functions.

Databases optimize window functions by sorting data once when multiple window functions share the same PARTITION BY and ORDER BY specification. Different window specifications require separate sort operations, potentially impacting performance with many distinct window definitions.

Ruby Implementation

Ruby applications access window functions through database adapters and ORMs. ActiveRecord provides window function support through the Arel library, though the API requires understanding Arel's expression building. Sequel offers more direct window function syntax that reads closer to SQL.

ActiveRecord builds window functions using Arel nodes. The Arel::Nodes::Window class represents the OVER clause, combining partition, order, and frame specifications. Window function calls use Arel::Nodes::NamedFunction with a window node.

# ActiveRecord with Arel
employees = Employee.select(
  :id,
  :department,
  :salary,
  Arel::Nodes::NamedFunction.new('AVG', [Employee.arel_table[:salary]])
    .over(Arel::Nodes::Window.new.partition(Employee.arel_table[:department]))
    .as('dept_avg')
)

The Sequel gem provides cleaner window function syntax through method chaining. Sequel treats window functions as first-class citizens with dedicated DSL methods for common operations.

# Sequel
DB[:employees].select(
  :id,
  :department,
  :salary,
  Sequel.function(:avg, :salary)
    .over(partition: :department)
    .as(:dept_avg)
)

For complex window specifications, raw SQL fragments integrate window functions while maintaining query builder benefits for other clauses. The select method accepts SQL strings that databases process directly.

# Raw SQL in ActiveRecord
Employee.select(
  :id,
  :department,
  :salary,
  Sequel.lit('ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rank')
)

Sequel supports frame specifications through the frame option in the over method. Frame specifications use a hash with :type, :start, and :end keys or a string for complex frames.

# Frame specification in Sequel
DB[:transactions].select(
  :date,
  :amount,
  Sequel.function(:sum, :amount).over(
    order: :date,
    frame: { type: :rows, start: 2, end: :current }
  ).as(:moving_sum)
)

For databases requiring vendor-specific window function syntax, connection adapters may need explicit type casting or function wrapping. PostgreSQL supports the full SQL standard window function specification while MySQL has partial support with restrictions on frame specifications.

Ruby code accessing window function results receives calculated values as regular attributes. The database computes window function values during query execution, returning them as columns in the result set.

# Accessing window function results
employees = Employee.select(
  '*',
  Sequel.lit('RANK() OVER (PARTITION BY department ORDER BY salary DESC) as dept_rank')
)

employees.each do |employee|
  puts "#{employee.name}: Rank #{employee.dept_rank} in #{employee.department}"
end

For complex window function queries with multiple specifications, defining named window specifications reduces duplication. Named windows appear in the WINDOW clause and get referenced by name in function calls. Sequel does not directly support named windows, requiring raw SQL for this feature.

# Named windows with raw SQL
sql = <<-SQL
  SELECT 
    id,
    department,
    salary,
    RANK() OVER w AS dept_rank,
    AVG(salary) OVER w AS dept_avg
  FROM employees
  WINDOW w AS (PARTITION BY department ORDER BY salary DESC)
SQL

results = ActiveRecord::Base.connection.execute(sql)

Practical Examples

Ranking employees within departments demonstrates basic window function usage. The RANK function assigns ranks based on salary order, with tied salaries receiving the same rank and leaving gaps in the sequence. DENSE_RANK eliminates gaps while ROW_NUMBER assigns unique sequential numbers even for tied values.

# Ranking with ActiveRecord
Employee.select(
  :name,
  :department,
  :salary,
  Sequel.lit('RANK() OVER (PARTITION BY department ORDER BY salary DESC) as salary_rank'),
  Sequel.lit('DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) as dense_rank'),
  Sequel.lit('ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) as row_num')
)

# Result shows ranking differences
# name        department  salary  salary_rank  dense_rank  row_num
# Alice       Engineering 120000  1            1           1
# Bob         Engineering 120000  1            1           2
# Carol       Engineering 110000  3            2           3
# David       Sales       90000   1            1           1

Running totals track cumulative values across ordered rows. The frame specification defaults to RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW when ORDER BY exists, creating running totals naturally. Explicit ROWS frame specifications control exactly which rows contribute to the sum.

# Running total in Sequel
DB[:transactions].select(
  :date,
  :amount,
  Sequel.function(:sum, :amount).over(
    order: :date,
    frame: { type: :rows, start: :unbounded, end: :current }
  ).as(:running_total)
).order(:date)

# Generates SQL:
# SELECT 
#   date,
#   amount,
#   SUM(amount) OVER (ORDER BY date 
#                     ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as running_total
# FROM transactions
# ORDER BY date

Moving averages calculate statistics over sliding windows. A three-period moving average uses ROWS BETWEEN 2 PRECEDING AND CURRENT ROW to include the current row and two preceding rows. The calculation adapts automatically at partition boundaries where fewer preceding rows exist.

# Moving average
DB[:stock_prices].select(
  :date,
  :price,
  Sequel.function(:avg, :price).over(
    order: :date,
    frame: { type: :rows, start: 2, end: :current }
  ).as(:moving_avg_3day)
).order(:date)

LAG and LEAD access values from preceding or following rows without self-joins. These functions take an offset parameter (default 1) and an optional default value for cases where no preceding or following row exists. Common uses include calculating differences between consecutive rows or comparing current values to historical values.

# Price changes using LAG
DB[:stock_prices].select(
  :date,
  :price,
  Sequel.function(:lag, :price, 1).over(order: :date).as(:previous_price),
  Sequel.lit('price - LAG(price, 1) OVER (ORDER BY date) as price_change')
).order(:date)

# Result shows daily changes
# date        price  previous_price  price_change
# 2024-01-01  100.0  NULL            NULL
# 2024-01-02  102.5  100.0           2.5
# 2024-01-03  101.0  102.5           -1.5

Percentile calculations use NTILE to divide ordered rows into roughly equal groups. NTILE(4) creates quartiles, NTILE(100) creates percentiles. Rows distribute as evenly as possible, with earlier groups receiving extra rows when the count doesn't divide evenly.

# Quartile analysis
Employee.select(
  :name,
  :salary,
  Sequel.lit('NTILE(4) OVER (ORDER BY salary) as salary_quartile')
).order(:salary)

FIRST_VALUE and LAST_VALUE retrieve values from the first or last row in the frame. These functions respect the frame specification, so LAST_VALUE with default framing returns the current row value, not the partition's last row. Retrieving the actual last value requires specifying ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING.

# First and last values
DB[:sales].select(
  :month,
  :revenue,
  Sequel.function(:first_value, :revenue).over(
    partition: :year,
    order: :month
  ).as(:year_first_month),
  Sequel.lit('LAST_VALUE(revenue) OVER (PARTITION BY year ORDER BY month 
              ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as year_last_month')
)

Gap and island problems identify continuous sequences in data. Combining ROW_NUMBER with arithmetic creates grouping values that identify islands (continuous sequences). A subquery with the grouping value enables aggregating each island.

# Find continuous employment periods
sql = <<-SQL
  WITH numbered AS (
    SELECT 
      employee_id,
      start_date,
      end_date,
      ROW_NUMBER() OVER (PARTITION BY employee_id ORDER BY start_date) -
      ROW_NUMBER() OVER (PARTITION BY employee_id, 
                         DATE_PART('year', start_date) ORDER BY start_date) as island_group
    FROM employment_periods
  )
  SELECT 
    employee_id,
    MIN(start_date) as period_start,
    MAX(end_date) as period_end,
    COUNT(*) as continuous_months
  FROM numbered
  GROUP BY employee_id, island_group
SQL

ActiveRecord::Base.connection.execute(sql)

Common Patterns

Ranking patterns use different functions depending on duplicate handling requirements. ROW_NUMBER generates unique sequential numbers regardless of ties, making it appropriate for pagination or when deterministic ordering of duplicates matters. RANK preserves tied rankings and creates gaps, reflecting actual competitive rankings. DENSE_RANK preserves tied rankings without gaps, useful for categorization or grouping.

# Ranking pattern comparison
DB[:products].select(
  :name,
  :rating,
  Sequel.function(:row_number).over(order: Sequel.desc(:rating)).as(:row_num),
  Sequel.function(:rank).over(order: Sequel.desc(:rating)).as(:rank),
  Sequel.function(:dense_rank).over(order: Sequel.desc(:rating)).as(:dense_rank)
)

# With ratings: A(5.0), B(4.5), C(4.5), D(4.0)
# row_num gives: 1, 2, 3, 4
# rank gives: 1, 2, 2, 4 (gap at 3)
# dense_rank gives: 1, 2, 2, 3 (no gap)

Cumulative aggregation patterns compute running totals, counts, or other accumulating metrics. The default frame with ORDER BY creates cumulative calculations automatically. Partitioning enables independent cumulative calculations per group, such as running totals per customer or department.

# Running totals per account
DB[:transactions].select(
  :account_id,
  :date,
  :amount,
  Sequel.function(:sum, :amount).over(
    partition: :account_id,
    order: :date
  ).as(:account_balance)
).order(:account_id, :date)

Moving window patterns calculate statistics over sliding frames. The frame size determines how many rows contribute to each calculation. ROWS-based frames count physical rows while RANGE-based frames group by value, creating different behaviors with duplicates. Moving averages, sums, and other aggregates smooth short-term fluctuations and reveal trends.

# 7-day moving average
DB[:metrics].select(
  :date,
  :value,
  Sequel.function(:avg, :value).over(
    order: :date,
    frame: { type: :rows, start: 6, end: :current }
  ).as(:avg_7day)
)

Comparison patterns use LAG and LEAD to compare each row against neighboring rows. Period-over-period analysis calculates growth rates, change percentages, or absolute differences without self-joins. The offset parameter accesses non-adjacent rows for year-over-year or quarter-over-quarter comparisons.

# Year-over-year comparison (12 months lag)
DB[:monthly_sales].select(
  :month,
  :revenue,
  Sequel.function(:lag, :revenue, 12).over(order: :month).as(:revenue_yoy),
  Sequel.lit('(revenue - LAG(revenue, 12) OVER (ORDER BY month)) / 
              LAG(revenue, 12) OVER (ORDER BY month) * 100 as pct_change_yoy')
)

Top-N per group patterns filter results to include only the highest or lowest ranked items within each partition. A subquery or CTE performs the ranking, then the outer query filters based on rank values. This pattern replaces complex self-joins for finding top performers per category.

# Top 3 employees per department
sql = <<-SQL
  WITH ranked AS (
    SELECT 
      name,
      department,
      salary,
      RANK() OVER (PARTITION BY department ORDER BY salary DESC) as rank
    FROM employees
  )
  SELECT name, department, salary
  FROM ranked
  WHERE rank <= 3
SQL

Employee.find_by_sql(sql)

Percentage of total patterns calculate each row's contribution to partition or global totals. Combining regular SUM with window SUM enables proportional calculations without grouping or joining. The frame specification controls whether percentages calculate against partition totals or running totals.

# Sales as percentage of department total
DB[:sales].select(
  :salesperson,
  :department,
  :amount,
  Sequel.lit('amount / SUM(amount) OVER (PARTITION BY department) * 100 as pct_of_dept'),
  Sequel.lit('amount / SUM(amount) OVER () * 100 as pct_of_total')
)

Performance Considerations

Window function execution requires sorting data by partition and order specifications. Multiple window functions with identical OVER clauses share a single sort operation, reducing computational cost. Different partition or order specifications require separate sorts, multiplying processing overhead. Consolidating window specifications when possible improves performance.

# Efficient - shares sort operation
DB[:employees].select(
  :name,
  :department,
  :salary,
  Sequel.function(:rank).over(partition: :department, order: Sequel.desc(:salary)).as(:rank),
  Sequel.function(:avg, :salary).over(partition: :department, order: Sequel.desc(:salary)).as(:avg)
)

# Less efficient - separate sort operations
DB[:employees].select(
  :name,
  :department,
  :salary,
  Sequel.function(:rank).over(partition: :department, order: Sequel.desc(:salary)).as(:rank),
  Sequel.function(:avg, :salary).over(partition: :department, order: :hire_date).as(:avg)
)

Indexes on partition and order columns accelerate window function execution. Databases can use indexes to avoid explicit sorting when index order matches the window specification. Composite indexes covering both PARTITION BY and ORDER BY columns provide maximum benefit. Index-only scans further improve performance when all accessed columns exist in the index.

Frame specifications affect memory usage and processing patterns. Unbounded frames require materializing entire partitions in memory. Bounded frames with small offsets process incrementally, reducing memory pressure. ROWS frames process more efficiently than RANGE frames because RANGE requires value comparison and duplicate handling.

Query planners may struggle to optimize queries combining window functions with complex joins or subqueries. Materialized CTEs isolate window function execution from other query operations, sometimes improving plan quality. Testing different query structures reveals performance characteristics for specific datasets and database versions.

# Materialized CTE isolating window calculation
sql = <<-SQL
  WITH aggregated AS MATERIALIZED (
    SELECT 
      department,
      COUNT(*) as employee_count,
      AVG(salary) as avg_salary
    FROM employees
    WHERE active = true
    GROUP BY department
  )
  SELECT 
    e.name,
    e.department,
    e.salary,
    a.employee_count,
    RANK() OVER (PARTITION BY e.department ORDER BY e.salary DESC) as dept_rank
  FROM employees e
  JOIN aggregated a ON e.department = a.department
SQL

Large result sets with window functions consume significant memory because databases must buffer window calculations. Limiting result sets before applying window functions reduces memory usage. When calculating top-N per group, filtering in a subquery before the outer window function operates on fewer rows.

Parallel query execution accelerates window function processing on multi-core systems. PostgreSQL parallelizes window functions that can partition work across multiple workers. Configuration parameters like max_parallel_workers_per_gather control parallelization. Checking execution plans reveals whether queries use parallel window function nodes.

Window functions in WHERE or HAVING clauses require subqueries because window functions execute after filtering. Materialized views or cached results avoid repeated window function computation for frequently accessed calculations. For real-time applications, trading query flexibility for precomputed results improves response times.

Common Pitfalls

The default frame specification causes unexpected results when ORDER BY exists in the OVER clause. Without explicit frame specification, the frame extends from partition start to the current row (RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW). LAST_VALUE with this default frame returns the current row value, not the last value in the partition. Explicitly specifying the full frame (UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) accesses the actual last value.

# Incorrect - returns current row
DB[:sales].select(
  :month,
  :revenue,
  Sequel.function(:last_value, :revenue).over(
    partition: :year,
    order: :month
  ).as(:incorrect_last)
)

# Correct - returns actual last value
DB[:sales].select(
  :month,
  :revenue,
  Sequel.lit('LAST_VALUE(revenue) OVER (PARTITION BY year ORDER BY month 
              ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as correct_last')
)

NULL values in ORDER BY columns create non-deterministic ordering without explicit NULLS FIRST or NULLS LAST specifications. Different databases treat NULLs differently (PostgreSQL sorts NULLs high by default, MySQL sorts them low). Rankings become inconsistent across database platforms or versions when NULL ordering remains implicit.

ROWS versus RANGE frame specifications produce different results with duplicate ORDER BY values. ROWS counts physical rows while RANGE treats all rows with identical ORDER BY values as a single unit. When ORDER BY contains ties, RANGE frames include all tied rows even if they exceed specified offsets, causing unexpected frame sizes.

# Different behaviors with duplicate values
# Data: dates (2024-01-01, 2024-01-01, 2024-01-02), amounts (100, 150, 200)

# ROWS counts physical rows - includes exactly 2 preceding rows
Sequel.function(:sum, :amount).over(
  order: :date,
  frame: { type: :rows, start: 1, end: :current }
)
# Row 1: 100, Row 2: 250 (100+150), Row 3: 350 (150+200)

# RANGE groups by value - includes all rows with same ORDER BY value
Sequel.function(:sum, :amount).over(
  order: :date,
  frame: { type: :range, start: 1, end: :current }
)
# Row 1: 250 (100+150, both rows have same date), Row 2: 250, Row 3: 450 (all three rows)

Window functions cannot reference other window function results directly in the same SELECT clause. Attempting to use a window function alias in another expression fails with undefined column errors. Subqueries or CTEs stage window function results for use in subsequent calculations.

# Incorrect - cannot reference window function alias
DB[:employees].select(
  :salary,
  Sequel.function(:avg, :salary).over(partition: :department).as(:dept_avg),
  Sequel.lit('salary - dept_avg as diff')  # Error: dept_avg undefined
)

# Correct - use CTE
sql = <<-SQL
  WITH averages AS (
    SELECT 
      salary,
      AVG(salary) OVER (PARTITION BY department) as dept_avg
    FROM employees
  )
  SELECT salary, dept_avg, salary - dept_avg as diff
  FROM averages
SQL

Performance degrades dramatically with unnecessary window function duplication. Each distinct OVER clause requires separate sort and window calculation operations. Calculating the same window function multiple times wastes resources when a single calculation would suffice.

Partitions with extreme size imbalances cause uneven processing loads. One massive partition forces sequential processing even with parallel execution enabled. Repartitioning strategies or filtering large partitions separately maintain performance. Monitoring partition sizes reveals skewed data distributions requiring remediation.

Mixing aggregate functions with window functions in GROUP BY queries requires careful attention to evaluation order. Databases execute GROUP BY before window functions, so window functions operate on grouped results. Window functions cannot disaggregate data already collapsed by GROUP BY. Nested subqueries separate grouping from window function calculations when both operations need the same columns.

Frame specifications with RANGE and datetime ORDER BY columns require explicit interval specifications in some databases. PostgreSQL supports RANGE with intervals, but other databases may reject or misinterpret these specifications. Testing frame specifications across target database platforms prevents runtime failures.

Calculating percentages with window function SUMs encounters division by zero when partition sums equal zero. Explicit NULL checking or NULLIF prevents errors, though the appropriate handling (NULL result, zero result, or filter the rows) depends on business logic requirements.

Reference

Window Function Categories

Category	Functions	Description
Ranking	ROW_NUMBER, RANK, DENSE_RANK, NTILE	Assign ranks or sequential numbers within partitions
Value	LAG, LEAD, FIRST_VALUE, LAST_VALUE, NTH_VALUE	Access values from other rows without self-joins
Aggregate	SUM, AVG, COUNT, MIN, MAX, STRING_AGG	Calculate aggregates over window frames
Statistical	STDDEV, VARIANCE, PERCENTILE_CONT, PERCENTILE_DISC	Compute statistical measures across frames

Window Specification Components

Component	Syntax	Purpose
Partitioning	PARTITION BY expr	Divide result set into independent groups
Ordering	ORDER BY expr	Define row sequence within partitions
Frame Type	ROWS or RANGE	Specify physical rows or logical value ranges
Frame Start	UNBOUNDED PRECEDING, n PRECEDING, CURRENT ROW	Define frame beginning boundary
Frame End	CURRENT ROW, n FOLLOWING, UNBOUNDED FOLLOWING	Define frame ending boundary

Frame Specification Defaults

Condition	Default Frame
No ORDER BY clause	RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING (entire partition)
ORDER BY without frame	RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW (partition start to current)
Explicit frame specified	Specified frame used as written

Common Frame Patterns

Pattern	Frame Specification	Use Case
Entire partition	ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING	Grand totals, first/last values
Running total	ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW	Cumulative sums, balances
Moving average	ROWS BETWEEN n PRECEDING AND CURRENT ROW	Smoothing, trend analysis
Centered window	ROWS BETWEEN n PRECEDING AND n FOLLOWING	Symmetric calculations
Forward looking	ROWS BETWEEN CURRENT ROW AND n FOLLOWING	Lead indicators, future context

Ranking Function Behaviors

Function	Gaps After Ties	Unique Values	Use Case
ROW_NUMBER	No	Yes	Deterministic ordering, pagination
RANK	Yes	No	Competition-style ranking
DENSE_RANK	No	No	Categorical grouping
NTILE(n)	Varies	No	Percentile groups, equal distribution

Ruby ORM Window Function Syntax

ORM	Syntax Pattern	Example
Sequel	function(:name).over(options)	Sequel.function(:rank).over(partition: :dept, order: :salary)
ActiveRecord	NamedFunction.new().over(Window.new)	Arel::Nodes::NamedFunction.new('RANK', []).over(window)
Raw SQL	Sequel.lit or Arel.sql	Sequel.lit('RANK() OVER (PARTITION BY dept ORDER BY salary)')

Performance Optimization Checklist

Optimization	Implementation	Impact
Shared OVER clauses	Use identical partition/order specs	Reduces sort operations
Index on partition columns	CREATE INDEX ON table (partition_col, order_col)	Avoids explicit sorting
Bounded frames	Use small PRECEDING/FOLLOWING offsets	Reduces memory usage
Filter before windows	WHERE clause reduces row count	Decreases processing volume
Materialized CTEs	WITH name AS MATERIALIZED	Isolates complex operations
Parallel execution	Enable parallel query settings	Uses multiple CPU cores

Common Error Patterns and Solutions

Error Pattern	Cause	Solution
LAST_VALUE returns current row	Default frame only includes preceding rows	Add ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
Cannot reference window alias	Window functions evaluated simultaneously	Use CTE or subquery to stage results
Unexpected frame size with ties	RANGE groups equal ORDER BY values	Use ROWS instead of RANGE for exact row counts
NULL ordering inconsistency	Database-dependent NULL sort behavior	Add explicit NULLS FIRST or NULLS LAST
Performance degradation	Multiple distinct OVER clauses	Consolidate to shared window specifications
Division by zero in percentages	Empty partitions or zero sums	Add NULLIF or CASE to handle zero divisors

Window Functions