Skip to main content

CREATE AGGREGATE FUNCTION

Introduced or updated: v1.2.799

Creates a user-defined aggregate function (UDAF) that runs inside Databend's JavaScript or Python runtime.

Supported Languages

  • javascript
  • python

Syntax

CREATE [ OR REPLACE ] FUNCTION [ IF NOT EXISTS ] <function_name>
( [ <parameter_list> ] )
STATE { <state_field_list> }
RETURNS <return_type>
LANGUAGE <language_name>
[ IMPORTS = (<stage_files>) ]
[ PACKAGES = (<python_packages>) ]
AS $$
<language_specific_code>
$$
[ DESC='<description>' ]
ParameterDescription
<function_name>Name of the aggregate function.
<parameter_list>Optional comma-separated list of input parameters and types (for example value DOUBLE).
STATE { <state_field_list> }Struct definition that Databend stores between partial/final aggregation steps (for example STATE { sum DOUBLE, count DOUBLE }).
<return_type>Data type returned by the aggregate (DOUBLE, INT, etc.).
LANGUAGERuntime used to execute the script. Supported values: javascript, python.
IMPORTS / PACKAGESOptional lists for shipping extra files (imports) or PyPI packages (Python only).
<language_specific_code>Script body that must expose create_state, accumulate, merge, and finish entry points.
DESCOptional description.

The script must implement these functions:

  • create_state() – allocate and return an initial state object.
  • accumulate(state, *args) – update the state for each input row.
  • merge(state1, state2) – merge two partial states.
  • finish(state) – produce the final result (return None for SQL NULL).

Access control requirements

PrivilegeObject TypeDescription
SUPERGlobal, TableOperates a UDF

To create a user-defined function, the user performing the operation or the current_role must have the SUPER privilege.

Examples

Python average UDAF

The following Python aggregate computes the average of a column:

CREATE OR REPLACE FUNCTION py_avg (value DOUBLE)
STATE { sum DOUBLE, count DOUBLE }
RETURNS DOUBLE
LANGUAGE python
AS $$
class State:
def __init__(self):
self.sum = 0.0
self.count = 0.0

def create_state():
return State()

def accumulate(state, value):
if value is not None:
state.sum += value
state.count += 1
return state

def merge(state1, state2):
state1.sum += state2.sum
state1.count += state2.count
return state1

def finish(state):
if state.count == 0:
return None
return state.sum / state.count
$$;

SELECT py_avg(number) AS avg_val FROM numbers(5);
+---------+
| avg_val |
+---------+
| 2 |
+---------+

JavaScript average UDAF

The next example shows the same calculation implemented in JavaScript:

CREATE OR REPLACE FUNCTION js_avg (value DOUBLE)
STATE { sum DOUBLE, count DOUBLE }
RETURNS DOUBLE
LANGUAGE javascript
AS $$
export function create_state() {
return { sum: 0, count: 0 };
}

export function accumulate(state, value) {
if (value !== null) {
state.sum += value;
state.count += 1;
}
return state;
}

export function merge(state1, state2) {
state1.sum += state2.sum;
state1.count += state2.count;
return state1;
}

export function finish(state) {
if (state.count === 0) {
return null;
}
return state.sum / state.count;
}
$$;

SELECT js_avg(number) AS avg_val FROM numbers(5);
+---------+
| avg_val |
+---------+
| 2 |
+---------+
Try Databend Cloud for FREE

Multimodal, object-storage-native warehouse for BI, vectors, search, and geo.

Snowflake-compatible SQL with automatic scaling.

Sign up and get $200 in credits.

Try it today