Databend vs. Snowflake: Data Ingestion Benchmark
Overview
We conducted four specific benchmarks to evaluate Databend Cloud versus Snowflake:
- TPC-H SF100 Dataset Loading: Focuses on loading performance and cost for a large-scale dataset (100GB, ~600 million rows).
- ClickBench Hits Dataset Loading: Tests efficiency in loading a wide-table dataset (76GB, ~100 million rows, 105 columns), emphasizing challenges associated with high column counts.
- 1-Second Freshness: Measures the platforms' ability to ingest data within a strict 1-second freshness requirement.
- 5-Second Freshness: Compares the platforms' data ingestion capabilities under a 5-second freshness constraint.
Platforms
- Snowflake: A well-known cloud data platform emphasizing scalable compute, data sharing.
- Databend Cloud: A cloud-native data warehouse built on the open-source Databend project, focusing on scalability and cost-efficiency.
Benchmark Conditions
Conducted on a Small-Size
warehouse (16vCPU, AWS us-east-2) using data from the same S3 bucket.
Performance and Cost Comparison
Performance and Cost
- TPC-H SF100 Data: Databend Cloud offers a 67% cost reduction over Snowflake.
- ClickBench Hits Data: Databend Cloud achieves a 91% cost reduction.
- 1-Second Freshness: Databend loads 400 times more data than Snowflake.
- 5-Second Freshness: Databend loads over 27 times more data.
Data Ingestion Benchmarks
TPC-H SF100 Dataset
Metric | Snowflake | Databend Cloud | Description |
---|---|---|---|
Total Time | 695s | 446s | Time to load the dataset. |
Total Cost | $0.77 | $0.25 | Cost of data loading. |
- Data Volume: 100GB
- Rows: Approx. 600 million
ClickBench Hits Dataset
Metric | Snowflake | Databend Cloud | Description |
---|---|---|---|
Total Time | 51m 17s | 9m 58s | Time to load the dataset. |
Total Cost | $3.42 | $0.30 | Cost of data loading. |
- Data Volume: 76GB
- Rows: Approx. 100 million
- Table Width: 105 columns
Freshness Benchmarks
1-Second Freshness Benchmark
Evaluates the volume of data ingested within a 1-second freshness requirement.
Metric | Snowflake | Databend Cloud | Description |
---|---|---|---|
Total Time | 1s | 1s | Loading time frame. |
Total Rows | 100 Rows | 40,000 Rows | Volume of data successfully ingested within 1s. |
5-Second Freshness Benchmark
Assesses the volume of data that can be ingested within a 5-second freshness requirement.
Metric | Snowflake | Databend Cloud | Description |
---|---|---|---|
Total Time | 5s | 5s | Loading time frame. |
Total Rows | 90,000 Rows | 2,500,000 Rows | Volume of data successfully ingested within 5s. |
Reproduce the Benchmark
You can reproduce the benchmark by following the steps below.
Benchmark Environment
Both Snowflake and Databend Cloud was tested under similar conditions:
Parameter | Snowflake | Databend Cloud |
---|---|---|
Warehouse Size | Small | Small |
vCPU | 16 | 16 |
Price | $4/hour | $2/hour |
AWS Region | us-east-2 | us-east-2 |
Storage | AWS S3 | AWS S3 |
- The TPC-H SF100 dataset, sourced from Amazon Redshift.
- The ClickBench dataset, sourced from ClickBench.
Prerequisites
- Have a Snowflake account
- Create a Databend Cloud account.
Data Ingestion Benchmark
The data ingestion benchmark can be reproduced using the following steps:
TPC-H Data Loading
-
Snowflake Data Load:
- Log into your Snowflake account.
- Create tables corresponding to the TPC-H schema. SQL Script.
- Use the
COPY INTO
command to load the data from AWS S3. SQL Script.
-
Databend Cloud Data Load:
- Sign in to your Databend Cloud account.
- Create the necessary tables as per the TPC-H schema. SQL Script.
- Utilize a similar method to Snowflake for loading data from AWS S3. SQL Script.
ClickBench Hits Data Loading
-
Snowflake Data Load:
- Log into your Snowflake account.
- Create tables corresponding to the
hits
schema. SQL Script. - Use the
COPY INTO
command to load the data from AWS S3. SQL Script.
-
Databend Cloud Data Load:
- Sign in to your Databend Cloud account.
- Create the necessary tables as per the
hits
schema. SQL Script. - Utilize a similar method to Snowflake for loading data from AWS S3. SQL Script.
Freshness Benchmark
Data generation and ingestion for the freshness benchmark can be reproduced using the following steps:
- Create an external stage in Databend Cloud
CREATE STAGE hits_unload_stage
URL = 's3://unload/files/'
CONNECTION = (
ACCESS_KEY_ID = '<your-access-key-id>',
SECRET_ACCESS_KEY = '<your-secret-access-key>'
);
- Unload data to the external stage.
CREATE or REPLACE FILE FORMAT tsv_unload_format_gzip
TYPE = TSV,
COMPRESSION = gzip;
COPY INTO @hits_unload_stage
FROM (
SELECT *
FROM hits limit <the-rows-you-want>
)
FILE_FORMAT = (FORMAT_NAME = 'tsv_unload_format_gzip')
DETAILED_OUTPUT = true;
- Load data from the external stage to the
hits
table.
COPY INTO hits
FROM @hits_unload_stage
PATTERN = '.*[.]tsv.gz'
FILE_FORMAT = (TYPE = TSV, COMPRESSION=auto);
- Measure results from the dashboard.