Skip to main content

Virtual Column: Automatic Acceleration for JSON Data

ENTERPRISE EDITION FEATURE
VIRTUAL COLUMN is an Enterprise Edition feature. Contact Databend Support for a license.

Virtual columns automatically accelerate queries on semi-structured data stored in VARIANT columns. This feature provides zero-configuration performance optimization for JSON data access.

What Problem Does It Solve?

When querying JSON data, traditional databases must parse the entire JSON structure every time you access a nested field. This creates performance bottlenecks:

ProblemImpactVirtual Column Solution
Query LatencyComplex JSON queries take secondsSub-second response times
Excessive Data ReadingMust read entire JSON documents even for single fieldsRead only the specific fields needed
Slow JSON ParsingEvery query re-parses entire JSON documentsPre-materialized fields for instant access
High CPU UsageJSON traversal consumes processing powerDirect column reads like regular data
Memory OverheadLoading full JSON structures into memoryOnly load needed fields

Example Scenario: An e-commerce analytics table with product data in JSON format. Without virtual columns, querying product_data['category'] across millions of rows requires parsing every JSON document. With virtual columns, it becomes a direct column lookup.

How It Works Automatically

  1. Data Ingestion → Databend analyzes JSON structure in VARIANT columns
  2. Smart Detection → System identifies frequently accessed nested fields
  3. Background Optimization → Virtual columns are created automatically
  4. Query Acceleration → Queries automatically use optimized paths

Virtual Column Workflow

Configuration

-- Enable the feature (experimental)
SET enable_experimental_virtual_column = 1;

-- Optional: Control auto-refresh behavior
SET enable_refresh_virtual_column_after_write = 1; -- Default: enabled

Complete Example

This example demonstrates automatic virtual column creation and performance benefits:

SET enable_experimental_virtual_column=1;

-- Create a table named 'test' with columns 'id' and 'val' of type Variant.
CREATE TABLE test(id int, val variant);

-- Insert sample records into the 'test' table with Variant data.
INSERT INTO
test
VALUES
(
1,
'{"id":1,"name":"databend","tags":["powerful","fast"],"pricings":[{"type":"Standard","price":"Pay as you go"},{"type":"Enterprise","price":"Custom"}]}'
),
(
2,
'{"id":2,"name":"databricks","tags":["scalable","flexible"],"pricings":[{"type":"Free","price":"Trial"},{"type":"Premium","price":"Subscription"}]}'
),
(
3,
'{"id":3,"name":"snowflake","tags":["cloud-native","secure"],"pricings":[{"type":"Basic","price":"Pay per second"},{"type":"Enterprise","price":"Annual"}]}'
),
(
4,
'{"id":4,"name":"redshift","tags":["reliable","scalable"],"pricings":[{"type":"On-Demand","price":"Pay per usage"},{"type":"Reserved","price":"1 year contract"}]}'
),
(
5,
'{"id":5,"name":"bigquery","tags":["innovative","cost-efficient"],"pricings":[{"type":"Flat Rate","price":"Monthly"},{"type":"Flex","price":"Per query"}]}'
);

INSERT INTO test SELECT * FROM test;
INSERT INTO test SELECT * FROM test;
INSERT INTO test SELECT * FROM test;
INSERT INTO test SELECT * FROM test;
INSERT INTO test SELECT * FROM test;

-- Explain the query execution plan for selecting specific fields from the table.
EXPLAIN
SELECT
val ['name'],
val ['tags'] [0],
val ['pricings'] [0] ['type']
FROM
test;

-[ EXPLAIN ]-----------------------------------
Exchange
├── output columns: [test.val['name'] (#3), test.val['pricings'][0]['type'] (#5), test.val['tags'][0] (#8)]
├── exchange type: Merge
└── TableScan
├── table: default.default.test
├── output columns: [val['name'] (#3), val['pricings'][0]['type'] (#5), val['tags'][0] (#8)]
├── read rows: 160
├── read size: 1.69 KiB
├── partitions total: 6
├── partitions scanned: 6
├── pruning stats: [segments: <range pruning: 6 to 6>, blocks: <range pruning: 6 to 6>]
├── push downs: [filters: [], limit: NONE]
├── virtual columns: [val['name'], val['pricings'][0]['type'], val['tags'][0]]
└── estimated rows: 160.00

-- Explain the query execution plan for selecting only the 'name' field from the table.
EXPLAIN
SELECT
val ['name']
FROM
test;

-[ EXPLAIN ]-----------------------------------
Exchange
├── output columns: [test.val['name'] (#2)]
├── exchange type: Merge
└── TableScan
├── table: default.book_db.test
├── output columns: [val['name'] (#2)]
├── read rows: 160
├── read size: < 1 KiB
├── partitions total: 16
├── partitions scanned: 16
├── pruning stats: [segments: <range pruning: 6 to 6>, blocks: <range pruning: 16 to 16>]
├── push downs: [filters: [], limit: NONE]
├── virtual columns: [val['name']]
└── estimated rows: 160.00

-- Display all the auto generated virtual columns.
SHOW VIRTUAL COLUMNS WHERE table='test';

╭────────────────────────────────────────────────────────────────────────────────────────────────────────╮
databasetable │ source_column │ virtual_column_id │ virtual_column_name │ virtual_column_type │
│ String │ String │ String │ UInt32 │ String │ String │
├──────────┼────────┼───────────────┼───────────────────┼──────────────────────────┼─────────────────────┤
default │ test │ val │ 3000000000['id'] │ UInt64 │
default │ test │ val │ 3000000001['name'] │ String │
default │ test │ val │ 3000000002['pricings'][0]['price'] │ String │
default │ test │ val │ 3000000003['pricings'][0]['type'] │ String │
default │ test │ val │ 3000000004['pricings'][1]['price'] │ String │
default │ test │ val │ 3000000005['pricings'][1]['type'] │ String │
default │ test │ val │ 3000000006['tags'][0] │ String │
default │ test │ val │ 3000000007['tags'][1] │ String │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Monitoring Commands

CommandPurpose
SHOW VIRTUAL COLUMNSView automatically created virtual columns
REFRESH VIRTUAL COLUMNManually refresh virtual columns
FUSE_VIRTUAL_COLUMNView virtual column metadata

Performance Results

Virtual columns typically provide:

  • 5-10x faster JSON field access
  • Automatic optimization without query changes
  • Reduced resource consumption during query processing
  • Transparent acceleration for existing applications

Virtual columns work automatically in the background - simply enable the feature and let Databend optimize your JSON queries.

Explore Databend Cloud for FREE
Low-cost
Fast Analytics
Easy Data Ingestion
Elastic Scaling
Try it today