Skip to main content

Querying Parquet Files in Stage

Syntax:

Tutorial

Step 1. Create an External Stage

Create an external stage with your own S3 bucket and credentials where your Parquet files are stored.

CREATE STAGE parquet_query_stage 
URL = 's3://load/parquet/'
CONNECTION = (
ACCESS_KEY_ID = '<your-access-key-id>'
SECRET_ACCESS_KEY = '<your-secret-access-key>'
);

Step 2. Create Custom Parquet File Format

CREATE FILE FORMAT parquet_query_format TYPE = PARQUET;

Step 3. Query Parquet Files

query with colum names:

SELECT *
FROM @parquet_query_stage
(
FILE_FORMAT => 'parquet_query_format',
PATTERN => '.*[.]parquet'
);

query with path expressions:

SELECT $1
FROM @parquet_query_stage
(
FILE_FORMAT => 'parquet_query_format',
PATTERN => '.*[.]parquet'
);

Query with Metadata

Query Parquet files directly from a stage, including metadata columns like METADATA$FILENAME and METADATA$FILE_ROW_NUMBER:

SELECT
METADATA$FILENAME,
METADATA$FILE_ROW_NUMBER,
*
FROM @parquet_query_stage
(
FILE_FORMAT => 'parquet_query_format',
PATTERN => '.*[.]parquet'
);