Querying TSV Files in Stage
Query TSV Files in Stage
Syntax:
SELECT [<alias>.]$<col_position> [, $<col_position> ...]
FROM {@<stage_name>[/<path>] [<table_alias>] | '<uri>' [<table_alias>]}
[(
[<connection_parameters>],
[ PATTERN => '<regex_pattern>'],
[ FILE_FORMAT => 'TSV| <custom_format_name>'],
[ FILES => ( '<file_name>' [ , '<file_name>' ] [ , ... ] ) ]
)]
Tips
Query Return Content Explanation:
- Return Format: Individual column values as strings by default
- Access Method: Use positional references
$<col_position>
(e.g.,$1
,$2
,$3
) - Example:
SELECT $1, $2, $3 FROM @stage_name
- Key Features:
- Columns accessed by position, not by name
- Each
$<col_position>
refers to a single column, not the whole row - Type casting required for non-string operations (e.g.,
CAST($1 AS INT)
) - No embedded schema information in TSV files
Tutorial
Step 1. Create an External Stage
Create an external stage with your own S3 bucket and credentials where your TSV files are stored.
CREATE STAGE tsv_query_stage
URL = 's3://load/tsv/'
CONNECTION = (
ACCESS_KEY_ID = '<your-access-key-id>'
SECRET_ACCESS_KEY = '<your-secret-access-key>'
);
Step 2. Create Custom TSV File Format
CREATE FILE FORMAT tsv_query_format
TYPE = TSV,
RECORD_DELIMITER = '\n',
FIELD_DELIMITER = ',',
COMPRESSION = AUTO;
- More TSV file format options refer to TSV File Format Options
Step 3. Query TSV Files
SELECT $1, $2, $3
FROM @tsv_query_stage
(
FILE_FORMAT => 'tsv_query_format',
PATTERN => '.*[.]tsv'
);
If the TSV files is compressed with gzip, we can use the following query:
SELECT $1, $2, $3
FROM @tsv_query_stage
(
FILE_FORMAT => 'tsv_query_format',
PATTERN => '.*[.]tsv[.]gz'
);
Query with Metadata
Query TSV files directly from a stage, including metadata columns like METADATA$FILENAME
and METADATA$FILE_ROW_NUMBER
:
SELECT
METADATA$FILENAME,
METADATA$FILE_ROW_NUMBER,
$1, $2, $3
FROM @tsv_query_stage
(
FILE_FORMAT => 'tsv_query_format',
PATTERN => '.*[.]tsv'
);