Skip to main content

Querying Parquet Files in Stage

Query Parquet Files in Stage

Syntax:

SELECT [<alias>.]<column> [, <column> ...] 
FROM {@<stage_name>[/<path>] [<table_alias>] | '<uri>' [<table_alias>]}
[(
[<connection_parameters>],
[ PATTERN => '<regex_pattern>'],
[ FILE_FORMAT => 'PARQUET | <custom_format_name>'],
[ FILES => ( '<file_name>' [ , '<file_name>' ] [ , ... ] ) ],
[ CASE_SENSITIVE => true | false ]
)]
Tips

Query Return Content Explanation:

  • Return Format: Column values in their native data types (not variants)
  • Access Method: Directly use column names column_name
  • Example: SELECT id, name, age FROM @stage_name
  • Key Features:
    • No need for path expressions (like $1:name)
    • No type casting required
    • Parquet files contain embedded schema information

Tutorial

Step 1. Create an External Stage

Create an external stage with your own S3 bucket and credentials where your Parquet files are stored.

CREATE STAGE parquet_query_stage 
URL = 's3://load/parquet/'
CONNECTION = (
ACCESS_KEY_ID = '<your-access-key-id>'
SECRET_ACCESS_KEY = '<your-secret-access-key>'
);

Step 2. Create Custom Parquet File Format

CREATE FILE FORMAT parquet_query_format 
TYPE = PARQUET
;

Step 3. Query Parquet Files

SELECT *
FROM @parquet_query_stage
(
FILE_FORMAT => 'parquet_query_format',
PATTERN => '.*[.]parquet'
);

Query with Metadata

Query Parquet files directly from a stage, including metadata columns like METADATA$FILENAME and METADATA$FILE_ROW_NUMBER:

SELECT
METADATA$FILENAME,
METADATA$FILE_ROW_NUMBER,
*
FROM @parquet_query_stage
(
FILE_FORMAT => 'parquet_query_format',
PATTERN => '.*[.]parquet'
);