Skip to main content

Virtual Column

ENTERPRISE EDITION FEATURE
VIRTUAL COLUMN is an Enterprise Edition feature. Contact Databend Support for a license.

A virtual column is a construct formed by extracting nested fields within Variant data and storing that data in separate storage files. Consider using virtual columns when you regularly query specific nested fields within Variant data to realize the following benefits:

  • Accelerated Query Processing: Virtual columns streamline the querying process by eliminating the need to traverse the entire nested structure to locate the desired data. Direct data retrieval from virtual columns parallels the process of accessing regular columns, resulting in a significant acceleration of query execution.

  • Reduced Memory Usage: Variant data often includes numerous internal fields, and reading all of them can lead to substantial memory consumption. By transitioning to reading virtual columns, there is a notable reduction in memory usage, mitigating the risk of potential memory overflows.

Alt text

Managing Virtual Columns

Databend provides a variety of commands to manage virtual columns. For details, see VIRTUAL COLUMN.

Usage Examples

This example demonstrates the practical use of virtual columns and their impact on query execution:

-- Create a table named 'test' with columns 'id' and 'val' of type Variant.
CREATE TABLE test(id int, val variant);

-- Create virtual columns for specific elements in the 'val' column.
CREATE VIRTUAL COLUMN (
val ['name'], -- Extract the 'name' field.
val ['tags'] [0], -- Extract the first element in the 'tags' array.
val ['pricings'] [0] ['type'] -- Extract the 'type' field from the first pricing in the 'pricings' array.
) FOR test;

-- Insert a sample record into the 'test' table with Variant data.
INSERT INTO
test
VALUES
(
1,
'{"id":1,"name":"databend","tags":["powerful","fast"],"pricings":[{"type":"Standard","price":"Pay as you go"},{"type":"Enterprise","price":"Custom"}]}'
);

-- Explain the query execution plan for selecting specific fields from the table.
EXPLAIN
SELECT
val ['name'],
val ['tags'] [0],
val ['pricings'] [0] ['type']
FROM
test;

-[ EXPLAIN ]-----------------------------------
TableScan
├── table: default.default.test
├── output columns: [val['name'] (#2), val['tags'][0] (#3), val['pricings'][0]['type'] (#4)]
├── read rows: 1
├── read bytes: 203
├── partitions total: 1
├── partitions scanned: 1
├── pruning stats: [segments: <range pruning: 1 to 1>, blocks: <range pruning: 1 to 1, bloom pruning: 0 to 0>]
├── push downs: [filters: [], limit: NONE, virtual_columns: [val['name'], val['pricings'][0]['type'], val['tags'][0]]]
└── estimated rows: 1.00

-- Explain the query execution plan for selecting only the 'name' field from the table.
EXPLAIN
SELECT
val ['name']
FROM
test;

-[ EXPLAIN ]-----------------------------------
TableScan
├── table: default.default.test
├── output columns: [val['name'] (#2)]
├── read rows: 1
├── read bytes: 203
├── partitions total: 1
├── partitions scanned: 1
├── pruning stats: [segments: <range pruning: 1 to 1>, blocks: <range pruning: 1 to 1, bloom pruning: 0 to 0>]
├── push downs: [filters: [], limit: NONE, virtual_columns: [val['name']]]
└── estimated rows: 1.00

-- Display all the virtual columns defined in the system.
SHOW VIRTUAL COLUMNS;

┌─────────────────────────────────────────────────────────────────────────────┐
databasetable │ virtual_columns │
├──────────┼────────┼─────────────────────────────────────────────────────────┤
default │ test │ val['name'], val['pricings'][0]['type'], val['tags'][0]
└─────────────────────────────────────────────────────────────────────────────┘

-- Drop the virtual columns associated with the 'test' table.
DROP VIRTUAL COLUMN FOR test;
Explore Databend Cloud for FREE
Low-cost
Fast Analytics
Easy Data Ingestion
Elastic Scaling
Try it today