Skip to main content

Apache Iceberg

Introduced or updated: v1.2.668

Databend supports the integration of an Apache Iceberg catalog, enhancing its compatibility and versatility for data management and analytics. This extends Databend's capabilities by seamlessly incorporating the powerful metadata and storage management capabilities of Apache Iceberg into the platform.

Datatype Mapping

This table maps data types between Apache Iceberg and Databend. Please note that Databend does not currently support Iceberg data types that are not listed in the table.

Apache IcebergDatabend
BOOLEANBOOLEAN
INTINT32
LONGINT64
DATEDATE
TIMESTAMP/TIMESTAMPZTIMESTAMP
FLOATFLOAT
DOUBLEDOUBLE
STRING/BINARYSTRING
DECIMALDECIMAL
ARRAY<TYPE>ARRAY, supports nesting
MAP<KEYTYPE, VALUETYPE>MAP
STRUCT<COL1: TYPE1, COL2: TYPE2, ...>TUPLE
LISTARRAY

Managing Catalogs

Databend provides you the following commands to manage catalogs:

CREATE CATALOG

Defines and establishes a new catalog in the Databend query engine.

Syntax

CREATE CATALOG <catalog_name>
TYPE=ICEBERG
CONNECTION=(
TYPE='<connection_type>'
ADDRESS='<address>'
WAREHOUSE='<warehouse_location>'
"<connection_parameter>"='<connection_parameter_value>'
"<connection_parameter>"='<connection_parameter_value>'
...
);
ParameterRequired?Description
<catalog_name>YesThe name of the catalog you want to create.
TYPEYesSpecifies the catalog type. For Iceberg, set to ICEBERG.
CONNECTIONYesThe connection parameters for the Iceberg catalog.
TYPE (inside CONNECTION)YesThe connection type. For Iceberg, it is typically set to rest for REST-based connection.
ADDRESSYesThe address or URL of the Iceberg service (e.g., http://127.0.0.1:8181).
WAREHOUSEYesThe location of the Iceberg warehouse, usually an S3 bucket or compatible object storage system.
<connection_parameter>YesConnection parameters to establish connections with external storage. The required parameters vary based on the specific storage service and authentication methods. See the table below for a full list of the available parameters.
Connection ParameterDescription
s3.endpointS3 endpoint.
s3.access-key-idS3 access key ID.
s3.secret-access-keyS3 secret access key.
s3.session-tokenS3 session token, required when using temporary credentials.
s3.regionS3 region.
client.regionRegion to use for the S3 client, takes precedence over s3.region.
s3.path-style-accessS3 Path Style Access.
s3.sse.typeS3 Server-Side Encryption (SSE) type.
s3.sse.keyS3 SSE key. If encryption type is kms, this is a KMS Key ID. If encryption type is custom, this is a base-64 AES256 symmetric key.
s3.sse.md5S3 SSE MD5 checksum.
client.assume-role.arnARN of the IAM role to assume instead of using the default credential chain.
client.assume-role.external-idOptional external ID used to assume an IAM role.
client.assume-role.session-nameOptional session name used to assume an IAM role.
s3.allow-anonymousOption to allow anonymous access (e.g., for public buckets/folders).
s3.disable-ec2-metadataOption to disable loading credentials from EC2 metadata (typically used with s3.allow-anonymous).
s3.disable-config-loadOption to disable loading configuration from config files and environment variables.
note

To read data from HDFS, you need to set the following environment variables before starting Databend. These environment variables ensure that Databend can access the necessary Java and Hadoop dependencies to interact with HDFS effectively. Make sure to replace "/path/to/java" and "/path/to/hadoop" with the actual paths to your Java and Hadoop installations, and adjust the CLASSPATH to include all the required Hadoop JAR files.

export JAVA_HOME=/path/to/java
export LD_LIBRARY_PATH=${JAVA_HOME}/lib/server:${LD_LIBRARY_PATH}
export HADOOP_HOME=/path/to/hadoop
export CLASSPATH=/all/hadoop/jar/files

SHOW CREATE CATALOG

Returns the detailed configuration of a specified catalog, including its type and storage parameters.

Syntax

SHOW CREATE CATALOG <catalog_name>;

SHOW CATALOGS

Shows all the created catalogs.

Syntax

SHOW CATALOGS [LIKE '<pattern>']

USE CATALOG

Switches the current session to the specified catalog.

Syntax

USE CATALOG <catalog_name>

Usage Examples

This example shows how to create an Iceberg catalog using a REST-based connection, specifying the service address, warehouse location (S3), and optional parameters like AWS region and custom endpoint:

CREATE CATALOG ctl
TYPE=ICEBERG
CONNECTION=(
TYPE='rest'
ADDRESS='http://127.0.0.1:8181'
WAREHOUSE='s3://iceberg-tpch'
"s3.region"='us-east-1'
"s3.endpoint"='http://127.0.0.1:9000'
);
Explore Databend Cloud for FREE
Low-cost
Fast Analytics
Easy Data Ingestion
Elastic Scaling
Try it today