Skip to main content

External Functions for Custom AI/ML

For advanced AI/ML scenarios, Databend supports external functions that connect your data with custom AI/ML infrastructure written in languages like Python.

FeatureDescriptionBenefits
Model FlexibilityUse open-source models or your internal AI/ML infrastructure• Freedom to choose any model
• Leverage existing ML investments
• Stay up-to-date with latest AI advancements
GPU AccelerationDeploy external function servers on GPU-equipped machines• Faster inference for deep learning models
• Handle larger batch sizes
• Support compute-intensive workloads
Custom ML ModelsDeploy and use your own machine learning models• Proprietary algorithms
• Domain-specific models
• Fine-tuned for your data
Advanced AI PipelinesBuild complex AI workflows with specialized libraries• Multi-step processing
• Custom transformations
• Integration with ML frameworks
ScalabilityHandle resource-intensive AI operations outside Databend• Independent scaling
• Optimized resource allocation
• High-throughput processing

Implementation Overview

  1. Create an external server with your AI/ML code (Python with databend-udf)
  2. Register the server with Databend using CREATE FUNCTION
  3. Call your AI/ML functions directly in SQL queries

Example: Custom AI Model Integration

# Simple embedding UDF server demo
from databend_udf import udf, UDFServer
from sentence_transformers import SentenceTransformer

# Load pre-trained model
model = SentenceTransformer('all-mpnet-base-v2') # 768-dimensional vectors

@udf(
input_types=["STRING"],
result_type="ARRAY(FLOAT)",
)
def ai_embed_768(inputs: list[str], headers) -> list[list[float]]:
"""Generate 768-dimensional embeddings for input texts"""
try:
# Process inputs in a single batch
embeddings = model.encode(inputs)
# Convert to list format
return [embedding.tolist() for embedding in embeddings]
except Exception as e:
print(f"Error generating embeddings: {e}")
# Return empty lists in case of error
return [[] for _ in inputs]

if __name__ == '__main__':
print("Starting embedding UDF server on port 8815...")
server = UDFServer("0.0.0.0:8815")
server.add_function(ai_embed_768)
server.serve()
-- Register the external function in Databend
CREATE OR REPLACE FUNCTION ai_embed_768 (STRING)
RETURNS ARRAY(FLOAT)
LANGUAGE PYTHON
HANDLER = 'ai_embed_768'
ADDRESS = 'https://your-ml-server.example.com';

-- Use the custom embedding in queries
SELECT
id,
title,
cosine_distance(
ai_embed_768(content),
ai_embed_768('machine learning techniques')
) AS similarity
FROM articles
ORDER BY similarity ASC
LIMIT 5;

For detailed instructions on setting up external functions, see External Functions.

Getting Started

Try these AI capabilities on Databend Cloud with a free trial.

Explore Databend Cloud for FREE
Low-cost
Fast Analytics
Easy Data Ingestion
Elastic Scaling
Try it today