Welcome to Databend Docs! Dive in with these tabs ↗️:
Databend: Discover core features, data import/export, third-party tool integration, and programming interfaces for ALL Databend editions. Additionally, find information on deploying Databend on-premises.
Databend Cloud: Learn about account registration, operational guidance, and organization management specific to Databend Cloud.
SQL Reference: Explore a comprehensive reference covering Databend general essentials, along with a variety of available SQL functions and commands.
Releases: Stay informed with release notes for Databend Cloud and updates on nightly builds.
This welcome page guides you through the features, architecture, and other important details about Databend.
- Data Manipulation
- Object Storage
- Blazing-fast data analytics on object storage.
- Leverages data-level parallelism and instruction-level parallelism technologies for .
- No indexes to build, no manual tuning, and no need to figure out partitions or shard data.
- Supports atomic operations such as
- Provides advanced features such as Time Travel and Multi Catalog (Apache Hive / Apache Iceberg).
- Supports ingestion of semi-structured data in various formats like CSV, JSON, and Parquet.
- Supports semi-structured data types such as ARRAY, MAP, and JSON.
- Supports Git-like MVCC storage for easy querying, cloning, and restoration of historical data.
- Supports various object storage platforms. Click here to see a full list of supported platforms.
- Allows instant elasticity, enabling users to scale up or down based on their application needs.
Databend's high-level architecture is composed of a
meta-service layer, a
query layer, and a
- Meta-Service Layer
- Query Layer
- Storage Layer
Databend efficiently supports multiple tenants through its meta-service layer, which plays a crucial role in the system:
- Metadata Management: Handles metadata for databases, tables, clusters, transactions, and more.
- Security: Manages user authentication and authorization for a secure environment.
Discover more about the meta-service layer in theon GitHub.
The query layer in Databend handles query computations and is composed of multiple clusters, each containing several nodes. Each node, a core unit in the query layer, consists of:
- Planner: Develops execution plans for SQL statements using elements from , incorporating operators like Projection, Filter, and Limit.
- Optimizer: A rule-based optimizer applies predefined rules, such as "predicate pushdown" and "pruning of unused columns", for optimal query execution.
- Processors: Constructs a query execution pipeline based on planner instructions, following a Pull&Push approach. Processors are interconnected, forming a pipeline that can be distributed across nodes for enhanced performance.
Discover more about the query layer in thedirectory on GitHub.
Databend employs Parquet, an open-source columnar format, and introduces its own table format to boost query performance. Key features include:
Secondary Indexes: Speeds up data location and access across various analysis dimensions.
Complex Data Type Indexes: Aimed at accelerating data processing and analysis for intricate types such as semi-structured data.
Segments: Databend effectively organizes data into segments, enhancing data management and retrieval efficiency.
Clustering: Employs user-defined clustering keys within segments to streamline data scanning.
Discover more about the storage layer in theon GitHub.
The Databend community is open to data professionals, students, and anyone who has a passion for cloud data warehouses. Feel free to click on the links below to be a part of the excitement: