Skip to main content

Working with File and Column Metadata

This guide explains how to query metadata from staged files. Metadata includes both file-level metadata (such as file name and row number) and column-level metadata (such as column names, types, and nullability).

Accessing File-Level Metadata

Databend supports accessing the following file-level metadata fields when reading staged files in the formats CSV, TSV, Parquet, and NDJSON:

File MetadataTypeDescription
metadata$filenameVARCHARThe name of the file from which the row was read
metadata$file_row_numberINTThe row number within the file (starting from 0)

These metadata fields are available in:

  • SELECT queries over stages (e.g., SELECT FROM @stage)
  • COPY INTO <table> statements

Examples

  1. Querying Metadata Fields

You can directly select metadata fields when reading from a stage:

SELECT
metadata$filename,
metadata$file_row_number,
*
FROM @my_internal_stage/iris.parquet
LIMIT 5;
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ metadata$filename │ metadata$file_row_number │ id │ sepal_length │ sepal_width │ petal_length │ petal_width │ species │ metadata$filename │ metadata$file_row_number │
├───────────────────┼──────────────────────────┼─────────────────┼───────────────────┼───────────────────┼───────────────────┼───────────────────┼──────────────────┼───────────────────┼──────────────────────────┤
│ iris.parquet │ 015.13.51.40.2 │ setosa │ iris.parquet │ 0
│ iris.parquet │ 124.931.40.2 │ setosa │ iris.parquet │ 1
│ iris.parquet │ 234.73.21.30.2 │ setosa │ iris.parquet │ 2
│ iris.parquet │ 344.63.11.50.2 │ setosa │ iris.parquet │ 3
│ iris.parquet │ 4553.61.40.2 │ setosa │ iris.parquet │ 4
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
  1. Using Metadata in COPY INTO

You can pass metadata fields into target table columns using COPY INTO:

COPY INTO iris_with_meta 
FROM (SELECT metadata$filename, metadata$file_row_number, $1, $2, $3, $4, $5 FROM @my_internal_stage/iris.parquet)
FILE_FORMAT=(TYPE=parquet);

Inferring Column Metadata from Files

Databend allows you to retrieve the following column-level metadata from your staged files in the Parquet format using the INFER_SCHEMA function:

Column MetadataTypeDescription
column_nameStringIndicates the name of the column.
typeStringIndicates the data type of the column.
nullableBooleanIndicates whether the column allows null values.
order_idUInt64Represents the column's position in the table.

Examples

The following example retrieves column metadata from a Parquet file staged in @my_internal_stage:

SELECT * FROM INFER_SCHEMA(location => '@my_internal_stage/iris.parquet');
┌──────────────────────────────────────────────┐
│ column_name │ type │ nullable │ order_id │
├──────────────┼─────────┼──────────┼──────────┤
│ id │ BIGINTtrue0
│ sepal_length │ DOUBLEtrue1
│ sepal_width │ DOUBLEtrue2
│ petal_length │ DOUBLEtrue3
│ petal_width │ DOUBLEtrue4
│ species │ VARCHARtrue5
└──────────────────────────────────────────────┘

Tutorials

Explore Databend Cloud for FREE
Low-cost
Fast Analytics
Easy Data Ingestion
Elastic Scaling
Try it today