Skip to main content

HISTOGRAM

Introduced or updated: v1.2.377

Computes the distribution of the data. It uses an "equal height" bucketing strategy to generate the histogram. The result of the function returns an empty or Json string.

Syntax

HISTOGRAM(<expr>)
HISTOGRAM(<expr> [, max_num_buckets])

max_num_buckets means the maximum number of buckets that can be used, by default it is 128.

For example:

select histogram(c_id) from histagg;
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ histogram(c_id)
│ Nullable(String)
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
[{"lower":"1","upper":"1","ndv":1,"count":6,"pre_sum":0},{"lower":"2","upper":"2","ndv":1,"count":6,"pre_sum":6}]
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

:::

Arguments

ArgumentsDescription
<expr>The data type of <expr> should be sortable.
max_num_bucketsOptional constant positive integer, the maximum number of buckets that can be used.

Return Type

the Nullable String type

Example

Create a Table and Insert Sample Data

CREATE TABLE histagg (
c_id INT,
c_tinyint TINYINT,
c_smallint SMALLINT,
c_int INT
);

INSERT INTO histagg VALUES
(1, 10, 20, 30),
(1, 11, 21, 33),
(1, 11, 12, 13),
(2, 21, 22, 23),
(2, 31, 32, 33),
(2, 10, 20, 30);

Query Demo 1

SELECT HISTOGRAM(c_int) FROM histagg;

Result

┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ histogram(c_int)
│ Nullable(String)
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
[{"lower":"13","upper":"13","ndv":1,"count":1,"pre_sum":0},{"lower":"23","upper":"23","ndv":1,"count":1,"pre_sum":1},{"lower":"30","upper":"30","ndv":1,"count":2,"pre_sum":2},{"lower":"33","upper":"33","ndv":1,"count":2,"pre_sum":4}]
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Query result description:

[
{
"lower": "13",
"upper": "13",
"ndv": 1,
"count": 1,
"pre_sum": 0
},
{
"lower": "23",
"upper": "23",
"ndv": 1,
"count": 1,
"pre_sum": 1
},
{
"lower": "30",
"upper": "30",
"ndv": 1,
"count": 2,
"pre_sum": 2
},
{
"lower": "33",
"upper": "33",
"ndv": 1,
"count": 2,
"pre_sum": 4
}
]

Fields description:

  • buckets:All buckets
    • lower:Upper bound of the bucket
    • upper:Lower bound of the bucket
    • count:The number of elements contained in the bucket
    • pre_sum:The total number of elements in the front bucket
    • ndv:The number of distinct values in the bucket
Explore Databend Cloud for FREE
Low-cost
Fast Analytics
Easy Data Ingestion
Elastic Scaling
Try it today