Skip to main content

Databend vs. Snowflake: Data Ingestion Benchmark

Overview

We conducted four specific benchmarks to evaluate Databend Cloud versus Snowflake:

  1. TPC-H SF100 Dataset Loading: Focuses on loading performance and cost for a large-scale dataset (100GB, ~600 million rows).
  2. ClickBench Hits Dataset Loading: Tests efficiency in loading a wide-table dataset (76GB, ~100 million rows, 105 columns), emphasizing challenges associated with high column counts.
  3. 1-Second Freshness: Measures the platforms' ability to ingest data within a strict 1-second freshness requirement.
  4. 5-Second Freshness: Compares the platforms' data ingestion capabilities under a 5-second freshness constraint.

Platforms

  • Snowflake: A well-known cloud data platform emphasizing scalable compute, data sharing.
  • Databend Cloud: A cloud-native data warehouse built on the open-source Databend project, focusing on scalability and cost-efficiency.

Benchmark Conditions

Conducted on a Small-Size warehouse (16vCPU, AWS us-east-2) using data from the same S3 bucket.

Performance and Cost Comparison

Performance and Cost

  • TPC-H SF100 Data: Databend Cloud offers a 67% cost reduction over Snowflake.
  • ClickBench Hits Data: Databend Cloud achieves a 91% cost reduction.
  • 1-Second Freshness: Databend loads 400 times more data than Snowflake.
  • 5-Second Freshness: Databend loads over 27 times more data.

Data Ingestion Benchmarks

image

TPC-H SF100 Dataset

MetricSnowflakeDatabend CloudDescription
Total Time695s446sTime to load the dataset.
Total Cost$0.77$0.25Cost of data loading.
  • Data Volume: 100GB
  • Rows: Approx. 600 million

ClickBench Hits Dataset

MetricSnowflakeDatabend CloudDescription
Total Time51m 17s9m 58sTime to load the dataset.
Total Cost$3.42$0.30Cost of data loading.
  • Data Volume: 76GB
  • Rows: Approx. 100 million
  • Table Width: 105 columns

Freshness Benchmarks

image

1-Second Freshness Benchmark

Evaluates the volume of data ingested within a 1-second freshness requirement.

MetricSnowflakeDatabend CloudDescription
Total Time1s1sLoading time frame.
Total Rows100 Rows40,000 RowsVolume of data successfully ingested within 1s.

5-Second Freshness Benchmark

Assesses the volume of data that can be ingested within a 5-second freshness requirement.

MetricSnowflakeDatabend CloudDescription
Total Time5s5sLoading time frame.
Total Rows90,000 Rows2,500,000 RowsVolume of data successfully ingested within 5s.

Reproduce the Benchmark

You can reproduce the benchmark by following the steps below.

Benchmark Environment

Both Snowflake and Databend Cloud was tested under similar conditions:

ParameterSnowflakeDatabend Cloud
Warehouse SizeSmallSmall
vCPU1616
Price$4/hour$2/hour
AWS Regionus-east-2us-east-2
StorageAWS S3AWS S3

Prerequisites

Data Ingestion Benchmark

The data ingestion benchmark can be reproduced using the following steps:

TPC-H Data Loading
  1. Snowflake Data Load:

  2. Databend Cloud Data Load:

ClickBench Hits Data Loading
  1. Snowflake Data Load:

  2. Databend Cloud Data Load:

Freshness Benchmark

Data generation and ingestion for the freshness benchmark can be reproduced using the following steps:

  1. Create an external stage in Databend Cloud
CREATE STAGE hits_unload_stage
URL = 's3://unload/files/'
CONNECTION = (
ACCESS_KEY_ID = '<your-access-key-id>',
SECRET_ACCESS_KEY = '<your-secret-access-key>'
);
  1. Unload data to the external stage.
CREATE or REPLACE FILE FORMAT tsv_unload_format_gzip 
TYPE = TSV,
COMPRESSION = gzip;

COPY INTO @hits_unload_stage
FROM (
SELECT *
FROM hits limit <the-rows-you-want>
)
FILE_FORMAT = (FORMAT_NAME = 'tsv_unload_format_gzip')
DETAILED_OUTPUT = true;
  1. Load data from the external stage to the hits table.
COPY INTO hits
FROM @hits_unload_stage
PATTERN = '.*[.]tsv.gz'
FILE_FORMAT = (TYPE = TSV, COMPRESSION=auto);
  1. Measure results from the dashboard.