Skip to main content

Aggregating Index: Precomputed Results for Instant Analytics

ENTERPRISE EDITION FEATURE
AGGREGATING INDEX is an Enterprise Edition feature. Contact Databend Support for a license.

Aggregating indexes dramatically accelerate analytical queries by precomputing and storing aggregation results, eliminating the need to scan entire tables for common analytics operations.

What Problem Does It Solve?

Analytical queries on large datasets face significant performance challenges:

ProblemImpactAggregating Index Solution
Full Table ScansSUM, COUNT, MIN, MAX queries scan millions of rowsRead precomputed results instantly
Repeated CalculationsSame aggregations computed over and overStore results once, reuse many times
Slow Dashboard QueriesAnalytics dashboards take minutes to loadSub-second response for common metrics
High Compute CostsHeavy aggregation workloads consume resourcesMinimal compute for cached results
Poor User ExperienceUsers wait for reports and analyticsInstant results for business intelligence

Example: A sales analytics query SELECT SUM(revenue), COUNT(*) FROM sales WHERE region = 'US' on 100M rows. Without aggregating index, it scans all US sales records. With aggregating index, it returns precomputed results instantly.

How It Works

  1. Index Creation → Define aggregation queries to precompute
  2. Result Storage → Databend stores aggregated results in optimized blocks
  3. Query Matching → Incoming queries automatically use precomputed results
  4. Automatic Updates → Results refresh when underlying data changes

Quick Setup

-- Create table with sample data
CREATE TABLE sales(region VARCHAR, product VARCHAR, revenue DECIMAL, quantity INT);

-- Create aggregating index for common analytics
CREATE AGGREGATING INDEX sales_summary AS
SELECT region, SUM(revenue), COUNT(*), AVG(quantity)
FROM sales
GROUP BY region;

-- Refresh the index (manual mode)
REFRESH AGGREGATING INDEX sales_summary;

-- Verify the index is used
EXPLAIN SELECT region, SUM(revenue) FROM sales GROUP BY region;

Supported Operations

✅ Supported❌ Not Supported
SUM, COUNT, MIN, MAX, AVGWindow Functions
GROUP BY clausesGROUPING SETS
WHERE filtersORDER BY, LIMIT
Simple aggregationsComplex subqueries

Refresh Strategies

StrategyWhen to UseConfiguration
Automatic (SYNC)Real-time analytics, small datasetsCREATE AGGREGATING INDEX ... SYNC
ManualLarge datasets, batch processingCREATE AGGREGATING INDEX ... (default)
Background (Cloud)Production workloadsAutomatic in Databend Cloud

Automatic vs Manual Refresh

-- Automatic refresh (updates with every data change)
CREATE AGGREGATING INDEX auto_summary AS
SELECT region, SUM(revenue) FROM sales GROUP BY region SYNC;

-- Manual refresh (update on demand)
CREATE AGGREGATING INDEX manual_summary AS
SELECT region, SUM(revenue) FROM sales GROUP BY region;

REFRESH AGGREGATING INDEX manual_summary;

Performance Example

This example shows the dramatic performance improvement:

-- Prepare data
CREATE TABLE agg(a int, b int, c int);
INSERT INTO agg VALUES (1,1,4), (1,2,1), (1,2,4), (2,2,5);

-- Create an aggregating index
CREATE AGGREGATING INDEX my_agg_index AS SELECT MIN(a), MAX(c) FROM agg;

-- Refresh the aggregating index
REFRESH AGGREGATING INDEX my_agg_index;

-- Verify if the aggregating index works
EXPLAIN SELECT MIN(a), MAX(c) FROM agg;

-- Key indicators in the execution plan:
-- ├── aggregating index: [SELECT MIN(a), MAX(c) FROM default.agg]
-- ├── rewritten query: [selection: [index_col_0 (#0), index_col_1 (#1)]]
-- This shows the query uses precomputed results instead of scanning raw data

Best Practices

PracticeBenefit
Index Common QueriesFocus on frequently executed analytics
Use Manual RefreshBetter control over update timing
Monitor Index UsageUse EXPLAIN to verify index utilization
Clean Up Unused IndexesRemove indexes that aren't being used
Match Query PatternsIndex filters should match actual queries

Management Commands

CommandPurpose
CREATE AGGREGATING INDEXCreate new aggregating index
REFRESH AGGREGATING INDEXUpdate index with latest data
DROP AGGREGATING INDEXRemove index (use VACUUM TABLE to clean storage)
SHOW AGGREGATING INDEXESList all indexes

Important Notes

tip

When to Use Aggregating Indexes:

  • Frequent analytical queries (dashboards, reports)
  • Large datasets with repeated aggregations
  • Stable query patterns
  • Performance-critical applications

When NOT to Use:

  • Frequently changing data
  • One-time analytical queries
  • Simple queries on small tables

Configuration

-- Enable/disable aggregating index feature
SET enable_aggregating_index_scan = 1; -- Enable (default)
SET enable_aggregating_index_scan = 0; -- Disable

Aggregating indexes are most effective for repetitive analytical workloads on large datasets. Start with your most common dashboard and reporting queries.