Skip to main content

External Functions in Databend Cloud

External functions in Databend allow you to define custom operations for processing data using external servers written in programming languages like Python. These functions enable you to extend Databend's capabilities by integrating custom logic, leveraging external libraries, and handling complex processing tasks. Key features of external functions include:

  • Scalability: Ideal for complex and resource-intensive data operations.
  • External Libraries: Leverage additional functionality through external libraries and dependencies.
  • Advanced Logic: Implement sophisticated data processing logic for complex scenarios.

Supported Programming Languages

The following table lists the supported languages and the required libraries for creating external functions in Databend:

LanguageRequired Library
Pythondatabend-udf

Managing External Functions

You can manage external functions using SQL commands such as CREATE FUNCTION, DROP FUNCTION, and SHOW FUNCTIONS. For more details, see External Function.

Configuring External Functions in Databend Cloud

To use external functions in Databend Cloud, you need to allowlist the addresses of your external function servers. The external function server must be accessible via a domain name over HTTPS. Please contact Databend Cloud support to add your allowed UDF server addresses:

  1. Navigate to Support > Create New Ticket in the Databend Cloud console.
  2. Provide the external server addresses (with HTTPS domain names) you wish to allowlist.
  3. Submit the ticket and await confirmation from the support team.

Usage Example: Creating an External Function in Python

This section demonstrates how to create an external function using Python.

1. Install the Required Library

Install the databend-udf library using pip:

pip install databend-udf

2. Define Your Function

Create a Python file (e.g., external_function.py) and define your external function. The following example defines an external server in Python that exposes a custom function gcd for calculating the greatest common divisor of two integers:

from databend_udf import udf, UDFServer

@udf(
input_types=["INT", "INT"],
result_type="INT",
skip_null=True,
)
def gcd(x: int, y: int) -> int:
while y != 0:
x, y = y, x % y
return x

if __name__ == '__main__':
# Create an external server listening at '0.0.0.0:8815'
server = UDFServer("0.0.0.0:8815")
# Add the defined function
server.add_function(gcd)
# Start the external server
server.serve()

Explanation of @udf Decorator Parameters:

ParameterDescription
input_typesA list of strings specifying the input data types (e.g., ["INT", "VARCHAR"]).
result_typeA string specifying the return value type (e.g., "INT").
name(Optional) Custom name for the function. If not provided, the original function name is used.
io_threadsNumber of I/O threads used per data chunk for I/O-bound functions.
skip_nullIf set to True, NULL values are not passed to the function, and the corresponding return value is set to NULL. Default is False.

Data Type Mappings Between Databend and Python:

Databend TypePython Type
BOOLEANbool
TINYINT (UNSIGNED)int
SMALLINT (UNSIGNED)int
INT (UNSIGNED)int
BIGINT (UNSIGNED)int
FLOATfloat
DOUBLEfloat
DECIMALdecimal.Decimal
DATEdatetime.date
TIMESTAMPdatetime.datetime
VARCHARstr
VARIANTany
MAP(K,V)dict
ARRAY(T)list[T]
TUPLE(T,...)tuple(T,...)

3. Run the External Server

Run the Python file to start the external server:

python3 external_function.py

Note: Ensure that the server is accessible from Databend Cloud and that the address is allowlisted. If not already done, contact Databend Cloud support to add the server address to the allowlist.

4. Register the Function in Databend Cloud

Register the function gcd in Databend using the CREATE FUNCTION statement:

CREATE FUNCTION gcd (INT, INT)
RETURNS INT
LANGUAGE PYTHON
HANDLER = 'gcd'
ADDRESS = '<your-allowed-server-address>';
  • Replace <your-allowed-server-address> with the actual address of your external server that has been allowlisted in Databend Cloud (must be an HTTPS domain).
  • The HANDLER specifies the name of the function as defined in your Python code.
  • The ADDRESS should match the address where your external server is running and must be allowlisted by Databend Cloud.

Example:

CREATE FUNCTION gcd (INT, INT)
RETURNS INT
LANGUAGE PYTHON
HANDLER = 'gcd'
ADDRESS = 'https://your-server-address';

Important: Before executing this statement, ensure that 'https://your-server-address' is allowlisted in Databend Cloud by contacting support.

You can now use the external function gcd in your SQL queries:

SELECT gcd(48, 18); -- Returns 6

Conclusion

External functions in Databend Cloud provide a powerful way to extend the functionality of your data processing pipelines by integrating custom code written in languages like Python. By following the steps outlined above, you can create and use external functions to handle complex processing tasks, leverage external libraries, and implement advanced logic.