Skip to main content

User-Defined Function

ENTERPRISE EDITION FEATURE
Python UDF is an Enterprise Edition feature. Contact Databend Support for a license.

User-Defined Functions (UDFs) offer enhanced flexibility by supporting both anonymous lambda expressions and predefined handlers (Python, JavaScript & WebAssembly) for defining UDFs. These features allow users to create custom operations tailored to their specific data processing needs. Databend UDFs are categorized into the following types:

Lambda UDFs

A lambda UDF allows users to define custom operations using anonymous functions (lambda expressions) directly within their queries. These lambda expressions are often concise and can be used to perform specific data transformations or computations that may not be achievable using built-in functions alone.

Usage Examples

This example creates UDFs to extract specific values from JSON data within a table using SQL queries.

CREATE OR REPLACE TABLE sale_items (
item_id INT,
details VARIANT
);

INSERT INTO sale_items VALUES
(1, PARSE_JSON('{"name": "T-Shirt", "price": 20.00, "discount_pct": 10}')), -- 10% discount
(2, PARSE_JSON('{"name": "Jeans", "price": 50.00, "discount_pct": 25}')), -- 25% discount
(3, PARSE_JSON('{"name": "Jacket", "price": 100.00, "discount_pct": 0}')), -- No discount
(4, PARSE_JSON('{"name": "Socks", "price": 5.00, "discount_pct": 50}')); -- 50% discount

-- Define a Lambda UDF to calculate the final price after discount
-- WITH EXPLICIT CASTING
CREATE OR REPLACE FUNCTION calculate_final_price AS (item_info) ->
(item_info['price']::FLOAT) * (1 - (item_info['discount_pct']::FLOAT) / 100.0);

SHOW USER FUNCTIONS;
--+-----------------------+----------------+-------------+---------------------------------+----------+----------------------------+
--| name | is_aggregate | description | arguments | language | created_on |
--+-----------------------+----------------+-------------+---------------------------------+----------+----------------------------+
--| calculate_final_price | 0 | | {"parameters":["item_info"]} | SQL | YYYY-MM-DD HH:MM:SS.ffffff |
--+-----------------------+----------------+-------------+---------------------------------+----------+----------------------------+

-- Use the Lambda UDF to get item names and their final prices
SELECT
item_id,
details['name']::STRING AS item_name,
details['price']::FLOAT AS original_price,
calculate_final_price(details) AS final_price
FROM sale_items
ORDER BY item_id;

-- Expected output for the SELECT query (final_price should now have values):
--+---------+-----------+----------------+-------------+
--| item_id | item_name | original_price | final_price |
--+---------+-----------+----------------+-------------+
--| 1 | T-Shirt | 20.00 | 18.00 |
--| 2 | Jeans | 50.00 | 37.50 |
--| 3 | Jacket | 100.00 | 100.00 |
--| 4 | Socks | 5.00 | 2.50 |
--+---------+-----------+----------------+-------------+

Embedded UDFs

Embedded UDFs allow you to embed code written in the following programming languages within SQL:

With Embedded UDFs, you can create both scalar functions and aggregate functions. Scalar functions operate on a single row of input and return a single value, while aggregate functions process multiple rows of input and return a single aggregated result, such as a sum or average.

note
  • Creating aggregate UDFs with WebAssembly is not yet supported.
  • If your program content is large, you can compress it and then pass it to a stage. See the Usage Examples for WebAssembly.

Python (requires Databend Enterprise)

A Python UDF allows you to invoke Python code from a SQL query via Databend's built-in handler, enabling seamless integration of Python logic within your SQL queries.

note

The Python UDF must use only Python's standard library; third-party imports are not allowed.

Data Type Mappings

DatabendPython
BOOLEANbool
TINYINTint
SMALLINTint
INTint
BIGINTint
FLOATfloat
DOUBLEfloat
DECIMALdecimal.Decimal
DATEdatetime.date
TIMESTAMPdatetime.datetime
VARCHARstr
BINARYbytes
ARRAYlist
TUPLEtuple
MAPdict
VARIANTstr
BITMAPstr
GEOMETRYstr

Usage Examples

-- Create a table with user interaction logs
CREATE TABLE user_interaction_logs (
log_id INT,
log_data VARIANT -- JSON interaction log
);

-- Insert sample interaction log data
INSERT INTO user_interaction_logs VALUES
(1, PARSE_JSON('{"user_id": "u123", "timestamp": "2023-01-15T10:00:00Z", "action": "view_product", "details": {"product_id": "p789", "category": "electronics", "price": 99.99}}')),
(2, PARSE_JSON('{"user_id": "u456", "timestamp": "2023-01-15T10:05:10Z", "action": "add_to_cart", "details": {"product_id": "p789", "quantity": 1, "category": "electronics"}}')),
(3, PARSE_JSON('{"user_id": "u123", "timestamp": "2023-01-15T10:02:30Z", "action": "search", "details": {"query": "wireless headphones", "results_count": 15}}')),
(4, PARSE_JSON('{"user_id": "u789", "timestamp": "2023-01-15T10:08:00Z", "action": "purchase", "details": {"order_id": "o555", "total_amount": 125.50, "item_count": 2}}')),
(5, PARSE_JSON('{"user_id": "u123", "timestamp": "2023-01-15T10:10:00Z", "action": "view_page", "details": {"page_name": "homepage"}}')),
(6, PARSE_JSON('{"user_id": "u456", "timestamp": "2023-01-15T10:12:00Z", "action": "purchase", "details": {"order_id": "o556", "total_amount": 25.00, "item_count": 1}}'));

-- Create a Python UDF to extract features from interaction logs
CREATE OR REPLACE FUNCTION extract_interaction_features_py(VARIANT)
RETURNS VARCHAR
LANGUAGE python HANDLER = 'extract_features'
AS $$
import json

def extract_features(log):
log_dict = log if isinstance(log, dict) else {}
action = log_dict.get('action', '').lower()
details = log_dict.get('details', {})
if not isinstance(details, dict):
details = {}

is_search_action = False
has_product_interaction = False
product_category_if_any = None
search_query_length = 0
purchase_value_bucket = None

if action == 'search':
is_search_action = True
search_query_length = len(details.get('query', ''))

if action in ['view_product', 'add_to_cart', 'remove_from_cart']:
has_product_interaction = True
product_category_if_any = details.get('category')

if action == 'purchase':
has_product_interaction = True

if action == 'purchase':
total_amount = details.get('total_amount', 0.0)
if not isinstance(total_amount, (int, float)):
total_amount = 0.0

if total_amount < 50:
purchase_value_bucket = 'Low'
elif total_amount < 200:
purchase_value_bucket = 'Medium'
else:
purchase_value_bucket = 'High'

result_dict = {
"is_search_action": is_search_action,
"has_product_interaction": has_product_interaction,
"product_category_if_any": product_category_if_any,
"search_query_length": search_query_length,
"purchase_value_bucket": purchase_value_bucket
}
return json.dumps(result_dict)
$$;

-- Use the Python UDF to extract features
SELECT
log_id,
log_data['user_id']::STRING AS user_id,
log_data['action']::STRING AS action,
extract_interaction_features_py(log_data) AS extracted_features
FROM
user_interaction_logs
ORDER BY
log_id;

JavaScript

A JavaScript UDF allows you to invoke JavaScript code from a SQL query via Databend's built-in handler, enabling seamless integration of JavaScript logic within your SQL queries.

Data Type Mappings

Databend TypeJS Type
NULLnull
BOOLEANBoolean
TINYINTNumber
TINYINT UNSIGNEDNumber
SMALLINTNumber
SMALLINT UNSIGNEDNumber
INTNumber
INT UNSIGNEDNumber
BIGINTNumber
BIGINT UNSIGNEDNumber
FLOATNumber
DOUBLENumber
VARCHARString
DATE / TIMESTAMPDate
DECIMALBigDecimal
BINARYUint8Array

Usage Examples

-- Create a table with user interaction logs
CREATE TABLE user_interaction_logs (
log_id INT,
log_data VARIANT -- JSON interaction log
);

-- Insert sample interaction log data
INSERT INTO user_interaction_logs VALUES
(1, PARSE_JSON('{"user_id": "u123", "timestamp": "2023-01-15T10:00:00Z", "action": "view_product", "details": {"product_id": "p789", "category": "electronics", "price": 99.99}}')),
(2, PARSE_JSON('{"user_id": "u456", "timestamp": "2023-01-15T10:05:10Z", "action": "add_to_cart", "details": {"product_id": "p789", "quantity": 1, "category": "electronics"}}')),
(3, PARSE_JSON('{"user_id": "u123", "timestamp": "2023-01-15T10:02:30Z", "action": "search", "details": {"query": "wireless headphones", "results_count": 15}}')),
(4, PARSE_JSON('{"user_id": "u789", "timestamp": "2023-01-15T10:08:00Z", "action": "purchase", "details": {"order_id": "o555", "total_amount": 125.50, "item_count": 2}}')),
(5, PARSE_JSON('{"user_id": "u123", "timestamp": "2023-01-15T10:10:00Z", "action": "view_page", "details": {"page_name": "homepage"}}')),
(6, PARSE_JSON('{"user_id": "u456", "timestamp": "2023-01-15T10:12:00Z", "action": "purchase", "details": {"order_id": "o556", "total_amount": 25.00, "item_count": 1}}'));


-- Create a JavaScript UDF to extract features from interaction logs
CREATE FUNCTION extract_interaction_features_js(VARIANT)
RETURNS VARIANT
LANGUAGE javascript HANDLER = 'extractFeatures'
AS $$
export function extractFeatures(log) {
const action = (log.action || '').toLowerCase();
const details = log.details || {};

let isSearchAction = false;
let hasProductInteraction = false;
let productCategoryIfAny = null;
let searchQueryLength = 0;
let purchaseValueBucket = null;

if (action === 'search') {
isSearchAction = true;
searchQueryLength = (details.query || '').length;
}

if (['view_product', 'add_to_cart', 'remove_from_cart'].includes(action)) {
hasProductInteraction = true;
productCategoryIfAny = details.category || null;
}

if (action === 'purchase' && !productCategoryIfAny) {
hasProductInteraction = true;
}

if (action === 'purchase') {
const totalAmount = details.total_amount || 0.0;
if (totalAmount < 50) {
purchaseValueBucket = 'Low';
} else if (totalAmount < 200) {
purchaseValueBucket = 'Medium';
} else {
purchaseValueBucket = 'High';
}
}

return {
is_search_action: isSearchAction,
has_product_interaction: hasProductInteraction,
product_category_if_any: productCategoryIfAny,
search_query_length: searchQueryLength,
purchase_value_bucket: purchaseValueBucket
};
}
$$;

WebAssembly

A WebAssembly UDF allows users to define custom logic or operations using languages that compile to WebAssembly. These UDFs can then be invoked directly within SQL queries to perform specific computations or data transformations.

Usage Examples

In this example, the "wasm_gcd" function is created to compute the greatest common divisor (GCD) of two integers. The function is defined using WebAssembly and its implementation resides in the 'test10_udf_wasm_gcd.wasm.zst' binary file.

Prior to its execution, the function implementation undergoes a series of steps. First, it is compiled into a binary file, followed by compression into 'test10_udf_wasm_gcd.wasm.zst'. Finally, the compressed file is uploaded to a stage in advance.

note

The function can be implemented in Rust, as demonstrated in the example available at https://github.com/arrow-udf/arrow-udf/blob/main/arrow-udf-example/src/lib.rs

CREATE FUNCTION wasm_gcd (INT, INT) RETURNS INT LANGUAGE wasm HANDLER = 'wasm_gcd(int4,int4)->int4' AS $$@data/udf/test10_udf_wasm_gcd.wasm.zst$$;

SELECT
number,
wasm_gcd((number * 3), (number * 6))
FROM
numbers(5)
WHERE
(number > 0)
ORDER BY 1;

Managing UDFs

Databend provides a variety of commands to manage UDFs. For details, see User-Defined Function.

Explore Databend Cloud for FREE
Low-cost
Fast Analytics
Easy Data Ingestion
Elastic Scaling
Try it today