Skip to main content

Querying Metadata

In this tutorial, we'll walk you through uploading a sample Parquet file to an internal stage, inferring the column definitions, and creating a table that includes file-level metadata fields. This is useful when you want to track the origin of each row or include metadata like file names and row numbers in your dataset.

Before You Start

Before you start, ensure you have the following prerequisites in place:

Step 1: Create an internal stage

CREATE STAGE my_internal_stage;

Step 2: Upload the sample file using BendSQL

Assuming your sample dataset is located at /Users/eric/Documents/iris.parquet, run the following command in BendSQL to upload it to the stage:

PUT fs:///Users/eric/Documents/iris.parquet @my_internal_stage;
┌───────────────────────────────────────────────────────┐
filestatus │ size │
├────────────────────────────────────┼─────────┼────────┤
/Users/eric/Documents/iris.parquet │ SUCCESS │ 6164
└───────────────────────────────────────────────────────┘

Step 3: Query column definitions from the staged file

SELECT * FROM INFER_SCHEMA(location => '@my_internal_stage/iris.parquet');
┌──────────────────────────────────────────────┐
│ column_name │ type │ nullable │ order_id │
├──────────────┼─────────┼──────────┼──────────┤
│ id │ BIGINTtrue0
│ sepal_length │ DOUBLEtrue1
│ sepal_width │ DOUBLEtrue2
│ petal_length │ DOUBLEtrue3
│ petal_width │ DOUBLEtrue4
│ species │ VARCHARtrue5
└──────────────────────────────────────────────┘

Step 4: Preview file content with metadata fields

You can use metadata fields such as metadata$filename and metadata$file_row_number to inspect file-level information:

SELECT
metadata$filename,
metadata$file_row_number,
*
FROM @my_internal_stage/iris.parquet
LIMIT 5;
┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ metadata$filename │ metadata$file_row_number │ id │ sepal_length │ sepal_width │ petal_length │ petal_width │ species │ metadata$filename │ metadata$file_row_number │
├───────────────────┼──────────────────────────┼─────────────────┼───────────────────┼───────────────────┼───────────────────┼───────────────────┼──────────────────┼───────────────────┼──────────────────────────┤
│ iris.parquet │ 015.13.51.40.2 │ setosa │ iris.parquet │ 0
│ iris.parquet │ 124.931.40.2 │ setosa │ iris.parquet │ 1
│ iris.parquet │ 234.73.21.30.2 │ setosa │ iris.parquet │ 2
│ iris.parquet │ 344.63.11.50.2 │ setosa │ iris.parquet │ 3
│ iris.parquet │ 4553.61.40.2 │ setosa │ iris.parquet │ 4
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Step 5: Create a table including metadata fields

Let’s create a table that includes the inferred columns plus metadata fields like filename and row number:

CREATE TABLE iris_with_meta AS
SELECT
metadata$filename AS iris_file,
metadata$file_row_number AS row_index,
sepal_length,
sepal_width,
petal_length,
petal_width,
species
FROM @my_internal_stage/iris.parquet;

Step 6: Query the data with metadata

SELECT * FROM iris_with_meta LIMIT 5;
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ iris_file │ row_index │ sepal_length │ sepal_width │ petal_length │ petal_width │ species │
├──────────────────┼──────────────────┼───────────────────┼───────────────────┼───────────────────┼───────────────────┼──────────────────┤
│ iris.parquet │ 05.13.51.40.2 │ setosa │
│ iris.parquet │ 14.931.40.2 │ setosa │
│ iris.parquet │ 24.73.21.30.2 │ setosa │
│ iris.parquet │ 34.63.11.50.2 │ setosa │
│ iris.parquet │ 453.61.40.2 │ setosa │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘