Skip to main content

HISTOGRAM

Introduced or updated: v1.2.377

Generates a data distribution histogram using an "equal height" bucketing strategy.

Syntax

HISTOGRAM(<expr>)

-- The following two forms are equivalent:
HISTOGRAM(<max_num_buckets>)(<expr>)
HISTOGRAM(<expr> [, <max_num_buckets>])
ParameterDescription
exprThe data type of expr should be sortable.
max_num_bucketsOptional positive integer specifying the maximum number of buckets. Default is 128.

Return Type

Returns either an empty string or a JSON object with the following structure:

  • buckets: List of buckets with detailed information:
    • lower: Lower bound of the bucket.
    • upper: Upper bound of the bucket.
    • count: Number of elements in the bucket.
    • pre_sum: Cumulative count of elements up to the current bucket.
    • ndv: Number of distinct values in the bucket.

Examples

This example shows how the HISTOGRAM function analyzes the distribution of c_int values in the histagg table, returning bucket boundaries, distinct value counts, element counts, and cumulative counts:

CREATE TABLE histagg (
c_id INT,
c_tinyint TINYINT,
c_smallint SMALLINT,
c_int INT
);

INSERT INTO histagg VALUES
(1, 10, 20, 30),
(1, 11, 21, 33),
(1, 11, 12, 13),
(2, 21, 22, 23),
(2, 31, 32, 33),
(2, 10, 20, 30);

SELECT HISTOGRAM(c_int) FROM histagg;

┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ histogram(c_int)
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
[{"lower":"13","upper":"13","ndv":1,"count":1,"pre_sum":0},{"lower":"23","upper":"23","ndv":1,"count":1,"pre_sum":1},{"lower":"30","upper":"30","ndv":1,"count":2,"pre_sum":2},{"lower":"33","upper":"33","ndv":1,"count":2,"pre_sum":4}]
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

The result is returned as a JSON array:

[
{
"lower": "13",
"upper": "13",
"ndv": 1,
"count": 1,
"pre_sum": 0
},
{
"lower": "23",
"upper": "23",
"ndv": 1,
"count": 1,
"pre_sum": 1
},
{
"lower": "30",
"upper": "30",
"ndv": 1,
"count": 2,
"pre_sum": 2
},
{
"lower": "33",
"upper": "33",
"ndv": 1,
"count": 2,
"pre_sum": 4
}
]

This example shows how HISTOGRAM(2) groups c_int values into two buckets:

SELECT HISTOGRAM(2)(c_int) FROM histagg;
-- Or
SELECT HISTOGRAM(c_int, 2) FROM histagg;

┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ histogram(2)(c_int)
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
[{"lower":"13","upper":"30","ndv":3,"count":4,"pre_sum":0},{"lower":"33","upper":"33","ndv":1,"count":2,"pre_sum":4}]
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

The result is returned as a JSON array:

[
{
"lower": "13",
"upper": "30",
"ndv": 3,
"count": 4,
"pre_sum": 0
},
{
"lower": "33",
"upper": "33",
"ndv": 1,
"count": 2,
"pre_sum": 4
}
]
Explore Databend Cloud for FREE
Low-cost
Fast Analytics
Easy Data Ingestion
Elastic Scaling
Try it today