Skip to main content

Custom AI/ML with External Functions

Build powerful AI/ML capabilities by connecting Databend with your own infrastructure. External functions let you deploy custom models, leverage GPU acceleration, and integrate with any ML framework while keeping your data secure.

Key Capabilities

FeatureBenefits
Custom ModelsUse any open-source or proprietary AI/ML models
GPU AccelerationDeploy on GPU-equipped machines for faster inference
Data PrivacyKeep your data within your infrastructure
ScalabilityIndependent scaling and resource optimization
FlexibilitySupport for any programming language and ML framework

How It Works

  1. Create AI Server: Build your AI/ML server using Python and databend-udf
  2. Register Function: Connect your server to Databend with CREATE FUNCTION
  3. Use in SQL: Call your custom AI functions directly in SQL queries

Example: Text Embedding Function

# Simple embedding UDF server demo
from databend_udf import udf, UDFServer
from sentence_transformers import SentenceTransformer

# Load pre-trained model
model = SentenceTransformer('all-mpnet-base-v2') # 768-dimensional vectors

@udf(
input_types=["STRING"],
result_type="ARRAY(FLOAT)",
)
def ai_embed_768(inputs: list[str], headers) -> list[list[float]]:
"""Generate 768-dimensional embeddings for input texts"""
try:
# Process inputs in a single batch
embeddings = model.encode(inputs)
# Convert to list format
return [embedding.tolist() for embedding in embeddings]
except Exception as e:
print(f"Error generating embeddings: {e}")
# Return empty lists in case of error
return [[] for _ in inputs]

if __name__ == '__main__':
print("Starting embedding UDF server on port 8815...")
server = UDFServer("0.0.0.0:8815")
server.add_function(ai_embed_768)
server.serve()
-- Register the external function in Databend
CREATE OR REPLACE FUNCTION ai_embed_768 (STRING)
RETURNS ARRAY(FLOAT)
LANGUAGE PYTHON
HANDLER = 'ai_embed_768'
ADDRESS = 'https://your-ml-server.example.com';

-- Use the custom embedding in queries
SELECT
id,
title,
cosine_distance(
ai_embed_768(content),
ai_embed_768('machine learning techniques')
) AS similarity
FROM articles
ORDER BY similarity ASC
LIMIT 5;

Learn More

Explore Databend Cloud for FREE
Low-cost
Fast Analytics
Easy Data Ingestion
Elastic Scaling
Try it today