Aggregate Functions
This page provides a comprehensive overview of aggregate functions in Databend, organized by functionality for easy reference.
Basic Aggregation
Function | Description | Example |
---|---|---|
COUNT | Counts the number of rows or non-NULL values | COUNT(*) → 10 |
COUNT_DISTINCT | Counts distinct values | COUNT(DISTINCT city) → 5 |
APPROX_COUNT_DISTINCT | Approximates count of distinct values | APPROX_COUNT_DISTINCT(user_id) → 9955 |
SUM | Calculates the sum of values | SUM(sales) → 1250.75 |
AVG | Calculates the average of values | AVG(temperature) → 72.5 |
MIN | Returns the minimum value | MIN(price) → 9.99 |
MAX | Returns the maximum value | MAX(price) → 99.99 |
ANY | Returns any value from the group | ANY(status) → 'active' |
Conditional Aggregation
Function | Description | Example |
---|---|---|
COUNT_IF | Counts rows that match a condition | COUNT_IF(price > 100) → 5 |
SUM_IF | Sums values that match a condition | SUM_IF(amount, status = 'completed') → 750.25 |
AVG_IF | Averages values that match a condition | AVG_IF(score, passed = true) → 85.6 |
MIN_IF | Returns minimum where condition is true | MIN_IF(temp, location = 'outside') → 45.2 |
MAX_IF | Returns maximum where condition is true | MAX_IF(speed, vehicle = 'car') → 120.5 |
Statistical Functions
Function | Description | Example |
---|---|---|
VAR_POP / VARIANCE_POP | Population variance | VAR_POP(height) → 10.25 |
VAR_SAMP / VARIANCE_SAMP | Sample variance | VAR_SAMP(height) → 12.3 |
STDDEV_POP | Population standard deviation | STDDEV_POP(height) → 3.2 |
STDDEV_SAMP | Sample standard deviation | STDDEV_SAMP(height) → 3.5 |
COVAR_POP | Population covariance | COVAR_POP(x, y) → 2.5 |
COVAR_SAMP | Sample covariance | COVAR_SAMP(x, y) → 2.7 |
KURTOSIS | Measures peakedness of distribution | KURTOSIS(values) → 2.1 |
SKEWNESS | Measures asymmetry of distribution | SKEWNESS(values) → 0.2 |
Percentile and Distribution
Function | Description | Example |
---|---|---|
MEDIAN | Calculates the median value | MEDIAN(response_time) → 125 |
MODE | Returns the most frequent value | MODE(category) → 'electronics' |
QUANTILE_CONT | Continuous interpolation quantile | QUANTILE_CONT(0.95)(response_time) → 350.5 |
QUANTILE_DISC | Discrete quantile | QUANTILE_DISC(0.5)(age) → 35 |
QUANTILE_TDIGEST | Approximate quantile using t-digest | QUANTILE_TDIGEST(0.9)(values) → 95.2 |
QUANTILE_TDIGEST_WEIGHTED | Weighted t-digest quantile | QUANTILE_TDIGEST_WEIGHTED(0.5)(values, weights) → 50.5 |
MEDIAN_TDIGEST | Approximate median using t-digest | MEDIAN_TDIGEST(response_time) → 124.5 |
HISTOGRAM | Creates histogram buckets | HISTOGRAM(10)(values) → [{...}] |
Array and Collection Aggregation
Function | Description | Example |
---|---|---|
ARRAY_AGG | Collects values into an array | ARRAY_AGG(product) → ['A', 'B', 'C'] |
GROUP_ARRAY_MOVING_AVG | Moving average over array | GROUP_ARRAY_MOVING_AVG(3)(values) → [null, null, 3.0, 6.0, 9.0] |
GROUP_ARRAY_MOVING_SUM | Moving sum over array | GROUP_ARRAY_MOVING_SUM(2)(values) → [null, 3, 7, 11, 15] |
String Aggregation
Function | Description | Example |
---|---|---|
GROUP_CONCAT | Concatenates values with separator | GROUP_CONCAT(city, ', ') → 'New York, London, Tokyo' |
STRING_AGG | Concatenates strings with separator | STRING_AGG(tag, ',') → 'red,green,blue' |
LISTAGG | Concatenates values with separator | LISTAGG(name, ', ') → 'Alice, Bob, Charlie' |
JSON Aggregation
Function | Description | Example |
---|---|---|
JSON_ARRAY_AGG | Aggregates values as JSON array | JSON_ARRAY_AGG(name) → '["Alice", "Bob", "Charlie"]' |
JSON_OBJECT_AGG | Creates JSON object from key-value pairs | JSON_OBJECT_AGG(name, score) → '{"Alice": 95, "Bob": 87}' |
Argument Selection
Function | Description | Example |
---|---|---|
ARG_MAX | Returns value of expr1 at maximum expr2 | ARG_MAX(name, score) → 'Alice' |
ARG_MIN | Returns value of expr1 at minimum expr2 | ARG_MIN(name, score) → 'Charlie' |
Funnel Analysis
Function | Description | Example |
---|---|---|
RETENTION | Calculates retention rates | RETENTION(action = 'signup', action = 'purchase') → [100, 40] |
WINDOWFUNNEL | Searches for event sequences within time window | WINDOWFUNNEL(1800)(timestamp, event='view', event='click', event='purchase') → 2 |