Skip to main content

Apache Hive Tables

Introduced or updated: v1.2.668

Databend can query data that is cataloged by Apache Hive without copying it. Register the Hive Metastore as a Databend catalog, point to the object storage that holds the table data, and then query the tables as if they were native Databend objects.

Quick Start

  1. Register the Hive Metastore

    CREATE CATALOG hive_prod
    TYPE = HIVE
    CONNECTION = (
    METASTORE_ADDRESS = '127.0.0.1:9083'
    URL = 's3://lakehouse/'
    ACCESS_KEY_ID = '<your_key_id>'
    SECRET_ACCESS_KEY = '<your_secret_key>'
    );
  2. Explore the catalog

    USE CATALOG hive_prod;
    SHOW DATABASES;
    SHOW TABLES FROM tpch;
  3. Query Hive tables

    SELECT l_orderkey, SUM(l_extendedprice) AS revenue
    FROM tpch.lineitem
    GROUP BY l_orderkey
    ORDER BY revenue DESC
    LIMIT 10;

Keep Metadata Fresh

Hive schemas or partitions can change outside of Databend. Refresh Databend’s cached metadata when that happens:

ALTER TABLE tpch.lineitem REFRESH CACHE;

Data Type Mapping

Databend automatically converts Hive primitive types to their closest native equivalents when queries run:

Hive TypeDatabend Type
BOOLEANBOOLEAN
TINYINT, SMALLINT, INT, BIGINTInteger types
FLOAT, DOUBLEFloating-point types
DECIMAL(p,s)DECIMAL
STRING, VARCHAR, CHARSTRING
DATE, TIMESTAMPDATETIME
ARRAY<type>ARRAY
MAP<key,value>MAP

Nested structures such as STRUCT are surfaced through the VARIANT type.

Notes and Limitations

  • Hive catalogs are read-only in Databend (writes must happen through Hive-compatible engines).
  • Access to the underlying object storage is required; configure credentials by using connection parameters.
  • Use ALTER TABLE ... REFRESH CACHE whenever table layout changes (for example, new partitions) to keep query results up to date.
Explore Databend Cloud for FREE
Low-cost
Fast Analytics
Easy Data Ingestion
Elastic Scaling
Try it today