跳到主要内容

Window Functions

Overview

A window function operates on a group ("window") of related rows. For each input row, a window function returns one output row that depends on the specific row passed to the function and the values of the other rows in the window.

There are two main types of order-sensitive window functions:

  • Rank-related functions: List information based on the "rank" of a row. For example, ranking stores in descending order by profit per year, the store with the most profit will be ranked 1, and the second-most profitable store will be ranked 2, and so on.

  • Window frame functions: Enable you to perform rolling operations, such as calculating a running total or a moving average, on a subset of the rows in the window.

Window Function Categories

Databend supports two main categories of window functions:

1. Dedicated Window Functions

These functions are specifically designed for window operations and provide ranking, navigation, and value analysis capabilities.

FunctionDescriptionExample
RANKReturns rank with gapsRANK() OVER (ORDER BY salary DESC)1, 2, 2, 4, ...
DENSE_RANKReturns rank without gapsDENSE_RANK() OVER (ORDER BY salary DESC)1, 2, 2, 3, ...
ROW_NUMBERReturns sequential row numberROW_NUMBER() OVER (ORDER BY hire_date)1, 2, 3, 4, ...
CUME_DISTReturns cumulative distributionCUME_DIST() OVER (ORDER BY score)0.2, 0.4, 0.8, 1.0, ...
PERCENT_RANKReturns relative rank (0-1)PERCENT_RANK() OVER (ORDER BY score)0.0, 0.25, 0.75, ...
NTILEDivides rows into N groupsNTILE(4) OVER (ORDER BY score)1, 1, 2, 2, 3, 3, 4, 4, ...
FIRST_VALUEReturns first value in windowFIRST_VALUE(product) OVER (PARTITION BY category ORDER BY sales)
LAST_VALUEReturns last value in windowLAST_VALUE(product) OVER (PARTITION BY category ORDER BY sales)
NTH_VALUEReturns Nth value in windowNTH_VALUE(product, 2) OVER (PARTITION BY category ORDER BY sales)
LEADAccess value from subsequent rowLEAD(price, 1) OVER (ORDER BY date) → next day's price
LAGAccess value from previous rowLAG(price, 1) OVER (ORDER BY date) → previous day's price
FIRSTReturns first value (alias)FIRST(product) OVER (PARTITION BY category ORDER BY sales)
LASTReturns last value (alias)LAST(product) OVER (PARTITION BY category ORDER BY sales)

2. Aggregate Functions Used as Window Functions

These are standard aggregate functions that can be used with the OVER clause to perform window operations.

FunctionDescriptionWindow Frame SupportExample
SUMCalculates sum over windowSUM(sales) OVER (PARTITION BY region ORDER BY date)
AVGCalculates average over windowAVG(score) OVER (ORDER BY id ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
COUNTCounts rows over windowCOUNT(*) OVER (PARTITION BY department)
MINReturns minimum value in windowMIN(price) OVER (PARTITION BY category)
MAXReturns maximum value in windowMAX(price) OVER (PARTITION BY category)
ARRAY_AGGCollects values into arrayARRAY_AGG(product) OVER (PARTITION BY category)
STDDEV_POPPopulation standard deviationSTDDEV_POP(value) OVER (PARTITION BY group)
STDDEV_SAMPSample standard deviationSTDDEV_SAMP(value) OVER (PARTITION BY group)
MEDIANMedian valueMEDIAN(response_time) OVER (PARTITION BY server)

Conditional Variants

FunctionDescriptionWindow Frame SupportExample
COUNT_IFConditional countCOUNT_IF(status = 'complete') OVER (PARTITION BY dept)
SUM_IFConditional sumSUM_IF(amount, status = 'paid') OVER (PARTITION BY customer)
AVG_IFConditional averageAVG_IF(score, passed = true) OVER (PARTITION BY class)
MIN_IFConditional minimumMIN_IF(temp, location = 'outside') OVER (PARTITION BY day)
MAX_IFConditional maximumMAX_IF(speed, vehicle = 'car') OVER (PARTITION BY test)

Window Function Syntax

<function> ( [ <arguments> ] ) OVER ( { named_window | inline_window } )

Where:

named_window ::= window_name

inline_window ::=
[ PARTITION BY <expression_list> ]
[ ORDER BY <expression_list> ]
[ window_frame ]

Key Components

ComponentDescriptionExample
<function>The window function to applySUM(), RANK(), etc.
OVERIndicates window function usageRequired for all window functions
PARTITION BYGroups rows into partitionsPARTITION BY department
ORDER BYOrders rows within each partitionORDER BY salary DESC
window_frameDefines subset of rows to considerROWS BETWEEN 1 PRECEDING AND CURRENT ROW
named_windowReferences a window defined in WINDOW clauseSELECT sum(x) OVER w FROM t WINDOW w AS (PARTITION BY y)

Window Frame Syntax

A window frame defines which rows are included in the function calculation for each row. There are two types of window frames:

1. Frame Types

Frame TypeDescriptionExample
ROWSPhysical row-based frameROWS BETWEEN 3 PRECEDING AND CURRENT ROW
RANGELogical value-based frameRANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

2. Frame Extent

Frame Extent PatternDescriptionExample
Cumulative Frames
BETWEEN UNBOUNDED PRECEDING AND CURRENT ROWAll rows from start to currentRunning total
BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWINGCurrent row to endRunning total from current position
Sliding Frames
BETWEEN N PRECEDING AND CURRENT ROWN rows before current + current3-day moving average
BETWEEN CURRENT ROW AND N FOLLOWINGCurrent + N rows afterForward-looking calculation
BETWEEN N PRECEDING AND N FOLLOWINGN rows before + current + N rows afterCentered moving average
BETWEEN UNBOUNDED PRECEDING AND N FOLLOWINGAll rows from start to N after currentExtended cumulative calculation
BETWEEN N PRECEDING AND UNBOUNDED FOLLOWINGN rows before current to endExtended backward calculation

Window Function Examples

The following examples demonstrate common window function use cases using an employee dataset.

Sample Data Setup

-- Create employees table
CREATE TABLE employees (
employee_id INT,
first_name VARCHAR,
last_name VARCHAR,
department VARCHAR,
salary INT
);

-- Insert sample data
INSERT INTO employees VALUES
(1, 'John', 'Doe', 'IT', 75000),
(2, 'Jane', 'Smith', 'HR', 85000),
(3, 'Mike', 'Johnson', 'IT', 90000),
(4, 'Sara', 'Williams', 'Sales', 60000),
(5, 'Tom', 'Brown', 'HR', 82000),
(6, 'Ava', 'Davis', 'Sales', 62000),
(7, 'Olivia', 'Taylor', 'IT', 72000),
(8, 'Emily', 'Anderson', 'HR', 77000),
(9, 'Sophia', 'Lee', 'Sales', 58000),
(10, 'Ella', 'Thomas', 'IT', 67000);

Example 1: Ranking Functions

Ranking employees by salary in descending order:

SELECT 
employee_id,
first_name,
last_name,
department,
salary,
RANK() OVER (ORDER BY salary DESC) AS rank,
DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num
FROM employees
ORDER BY salary DESC;

Result:

employee_idfirst_namelast_namedepartmentsalaryrankdense_rankrow_num
3MikeJohnsonIT90000111
2JaneSmithHR85000222
5TomBrownHR82000333
8EmilyAndersonHR77000444
1JohnDoeIT75000555

Example 2: Partitioning

Calculating statistics per department:

SELECT DISTINCT
department,
COUNT(*) OVER (PARTITION BY department) AS employee_count,
SUM(salary) OVER (PARTITION BY department) AS total_salary,
AVG(salary) OVER (PARTITION BY department) AS avg_salary,
MIN(salary) OVER (PARTITION BY department) AS min_salary,
MAX(salary) OVER (PARTITION BY department) AS max_salary
FROM employees
ORDER BY department;

Result:

departmentemployee_counttotal_salaryavg_salarymin_salarymax_salary
HR324400081333.337700085000
IT430400076000.006700090000
Sales318000060000.005800062000

Example 3: Running Totals and Moving Averages

Calculating running totals and moving averages within departments:

SELECT 
employee_id,
first_name,
department,
salary,
-- Running total (cumulative sum)
SUM(salary) OVER (
PARTITION BY department
ORDER BY employee_id
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total,
-- Moving average of current and previous row
AVG(salary) OVER (
PARTITION BY department
ORDER BY employee_id
ROWS BETWEEN 1 PRECEDING AND CURRENT ROW
) AS moving_avg
FROM employees
ORDER BY department, employee_id;

Result:

employee_idfirst_namedepartmentsalaryrunning_totalmoving_avg
2JaneHR850008500085000.00
5TomHR8200016700083500.00
8EmilyHR7700024400079500.00
1JohnIT750007500075000.00
3MikeIT9000016500082500.00