Aggregating Index: Precomputed Results for Instant Analytics

ENTERPRISE EDITION FEATURE

AGGREGATING INDEX is an Enterprise Edition feature. Contact Databend Support for a license.

Aggregating indexes dramatically accelerate analytical queries by precomputing and storing aggregation results, eliminating the need to scan entire tables for common analytics operations.

What Problem Does It Solve?

Analytical queries on large datasets face significant performance challenges:

Problem	Impact	Aggregating Index Solution
Full Table Scans	SUM, COUNT, MIN, MAX queries scan millions of rows	Read precomputed results instantly
Repeated Calculations	Same aggregations computed over and over	Store results once, reuse many times
Slow Dashboard Queries	Analytics dashboards take minutes to load	Sub-second response for common metrics
High Compute Costs	Heavy aggregation workloads consume resources	Minimal compute for cached results
Poor User Experience	Users wait for reports and analytics	Instant results for business intelligence

Example: A sales analytics query SELECT SUM(revenue), COUNT(*) FROM sales WHERE region = 'US' on 100M rows. Without aggregating index, it scans all US sales records. With aggregating index, it returns precomputed results instantly.

How It Works

Index Creation → Define aggregation queries to precompute
Result Storage → Databend stores aggregated results in optimized blocks
Query Matching → Incoming queries automatically use precomputed results
Automatic Updates → Results refresh when underlying data changes

Quick Setup

-- Create table with sample data
CREATE TABLE sales(region VARCHAR, product VARCHAR, revenue DECIMAL, quantity INT);

-- Create aggregating index for common analytics
CREATE AGGREGATING INDEX sales_summary AS 
SELECT region, SUM(revenue), COUNT(*), AVG(quantity) 
FROM sales 
GROUP BY region;

-- Refresh the index (manual mode)
REFRESH AGGREGATING INDEX sales_summary;

-- Verify the index is used
EXPLAIN SELECT region, SUM(revenue) FROM sales GROUP BY region;

Supported Operations

✅ Supported	❌ Not Supported
SUM, COUNT, MIN, MAX, AVG	Window Functions
GROUP BY clauses	GROUPING SETS
WHERE filters	ORDER BY, LIMIT
Simple aggregations	Complex subqueries

Refresh Strategies

Strategy	When to Use	Configuration
Automatic (SYNC)	Real-time analytics, small datasets	`CREATE AGGREGATING INDEX ... SYNC`
Manual	Large datasets, batch processing	`CREATE AGGREGATING INDEX ...` (default)
Background (Cloud)	Production workloads	Automatic in Databend Cloud

Automatic vs Manual Refresh

-- Automatic refresh (updates with every data change)
CREATE AGGREGATING INDEX auto_summary AS 
SELECT region, SUM(revenue) FROM sales GROUP BY region SYNC;

-- Manual refresh (update on demand)
CREATE AGGREGATING INDEX manual_summary AS 
SELECT region, SUM(revenue) FROM sales GROUP BY region;

REFRESH AGGREGATING INDEX manual_summary;

Performance Example

This example shows the dramatic performance improvement:

-- Prepare data
CREATE TABLE agg(a int, b int, c int);
INSERT INTO agg VALUES (1,1,4), (1,2,1), (1,2,4), (2,2,5);

-- Create an aggregating index
CREATE AGGREGATING INDEX my_agg_index AS SELECT MIN(a), MAX(c) FROM agg;

-- Refresh the aggregating index
REFRESH AGGREGATING INDEX my_agg_index;

-- Verify if the aggregating index works
EXPLAIN SELECT MIN(a), MAX(c) FROM agg;

-- Key indicators in the execution plan:
-- ├── aggregating index: [SELECT MIN(a), MAX(c) FROM default.agg]
-- ├── rewritten query: [selection: [index_col_0 (#0), index_col_1 (#1)]]
-- This shows the query uses precomputed results instead of scanning raw data

Best Practices

Practice	Benefit
Index Common Queries	Focus on frequently executed analytics
Use Manual Refresh	Better control over update timing
Monitor Index Usage	Use EXPLAIN to verify index utilization
Clean Up Unused Indexes	Remove indexes that aren't being used
Match Query Patterns	Index filters should match actual queries

Management Commands

Command	Purpose
`CREATE AGGREGATING INDEX`	Create new aggregating index
`REFRESH AGGREGATING INDEX`	Update index with latest data
`DROP AGGREGATING INDEX`	Remove index (use VACUUM TABLE to clean storage)
`SHOW AGGREGATING INDEXES`	List all indexes

Important Notes

tip

When to Use Aggregating Indexes:

Frequent analytical queries (dashboards, reports)
Large datasets with repeated aggregations
Stable query patterns
Performance-critical applications

When NOT to Use:

Frequently changing data
One-time analytical queries
Simple queries on small tables

Configuration

-- Enable/disable aggregating index feature
SET enable_aggregating_index_scan = 1;  -- Enable (default)
SET enable_aggregating_index_scan = 0;  -- Disable

Aggregating indexes are most effective for repetitive analytical workloads on large datasets. Start with your most common dashboard and reporting queries.

Aggregating Index: Precomputed Results for Instant Analytics

What Problem Does It Solve?

How It Works

Quick Setup

Supported Operations

Refresh Strategies

Automatic vs Manual Refresh

Performance Example

Best Practices

Management Commands

Important Notes

Configuration

Join our growing community

GitHub

Slack

X(Twitter)

YouTube

Explore Databend Cloud for FREE

What Problem Does It Solve?​

How It Works​

Quick Setup​

Supported Operations​

Refresh Strategies​

Automatic vs Manual Refresh​

Performance Example​

Best Practices​

Management Commands​

Important Notes​

Configuration​

Join our growing community

GitHub

Slack

X(Twitter)

YouTube

Explore Databend Cloud for FREE

What Problem Does It Solve?

How It Works

Quick Setup

Supported Operations

Refresh Strategies

Automatic vs Manual Refresh

Performance Example

Best Practices

Management Commands

Important Notes

Configuration