Querying NDJSON Files in Stage
Query NDJSON Files in Stage
Syntax:
SELECT [<alias>.]$1:<column> [, $1:<column> ...]
FROM {@<stage_name>[/<path>] [<table_alias>] | '<uri>' [<table_alias>]}
[(
[<connection_parameters>],
[ PATTERN => '<regex_pattern>'],
[ FILE_FORMAT => 'NDJSON| <custom_format_name>'],
[ FILES => ( '<file_name>' [ , '<file_name>' ] [ , ... ] ) ]
)]
Tips
Query Return Content Explanation:
- Return Format: Each row as a single variant object (referenced as
$1
) - Access Method: Use path expressions
$1:column_name
- Example:
SELECT $1:title, $1:author FROM @stage_name
- Key Features:
- Must use path notation to access specific fields
- Type casting required for type-specific operations (e.g.,
CAST($1:id AS INT)
) - Each NDJSON line is parsed as a complete JSON object
- Whole row is represented as a single variant object
Tutorial
Step 1. Create an External Stage
Create an external stage with your own S3 bucket and credentials where your NDJSON files are stored.
CREATE STAGE ndjson_query_stage
URL = 's3://load/ndjson/'
CONNECTION = (
ACCESS_KEY_ID = '<your-access-key-id>'
SECRET_ACCESS_KEY = '<your-secret-access-key>'
);
Step 2. Create Custom NDJSON File Format
CREATE FILE FORMAT ndjson_query_format
TYPE = NDJSON,
COMPRESSION = AUTO;
- More NDJSON file format options refer to NDJSON File Format Options
Step 3. Query NDJSON Files
SELECT $1:title, $1:author
FROM @ndjson_query_stage
(
FILE_FORMAT => 'ndjson_query_format',
PATTERN => '.*[.]ndjson'
);
If the NDJSON files are compressed with gzip, we can use the following query:
SELECT $1:title, $1:author
FROM @ndjson_query_stage
(
FILE_FORMAT => 'ndjson_query_format',
PATTERN => '.*[.]ndjson[.]gz'
);
Query with Metadata
Query NDJSON files directly from a stage, including metadata columns like METADATA$FILENAME
and METADATA$FILE_ROW_NUMBER
:
SELECT
METADATA$FILENAME,
METADATA$FILE_ROW_NUMBER,
$1:title, $1:author
FROM @ndjson_query_stage
(
FILE_FORMAT => 'ndjson_query_format',
PATTERN => '.*[.]ndjson'
);