Apache Hive Tables
Introduced or updated: v1.2.668
Databend can query data that is cataloged by Apache Hive without copying it. Register the Hive Metastore as a Databend catalog, point to the object storage that holds the table data, and then query the tables as if they were native Databend objects.
Quick Start
-
Register the Hive Metastore
CREATE CATALOG hive_prod
TYPE = HIVE
CONNECTION = (
METASTORE_ADDRESS = '127.0.0.1:9083'
URL = 's3://lakehouse/'
ACCESS_KEY_ID = '<your_key_id>'
SECRET_ACCESS_KEY = '<your_secret_key>'
); -
Explore the catalog
USE CATALOG hive_prod;
SHOW DATABASES;
SHOW TABLES FROM tpch; -
Query Hive tables
SELECT l_orderkey, SUM(l_extendedprice) AS revenue
FROM tpch.lineitem
GROUP BY l_orderkey
ORDER BY revenue DESC
LIMIT 10;
Keep Metadata Fresh
Hive schemas or partitions can change outside of Databend. Refresh Databend’s cached metadata when that happens:
ALTER TABLE tpch.lineitem REFRESH CACHE;
Data Type Mapping
Databend automatically converts Hive primitive types to their closest native equivalents when queries run:
| Hive Type | Databend Type |
|---|---|
BOOLEAN | BOOLEAN |
TINYINT, SMALLINT, INT, BIGINT | Integer types |
FLOAT, DOUBLE | Floating-point types |
DECIMAL(p,s) | DECIMAL |
STRING, VARCHAR, CHAR | STRING |
DATE, TIMESTAMP | DATETIME |
ARRAY<type> | ARRAY |
MAP<key,value> | MAP |
Nested structures such as STRUCT are surfaced through the VARIANT type.
Notes and Limitations
- Hive catalogs are read-only in Databend (writes must happen through Hive-compatible engines).
- Access to the underlying object storage is required; configure credentials by using connection parameters.
- Use
ALTER TABLE ... REFRESH CACHEwhenever table layout changes (for example, new partitions) to keep query results up to date.