Querying CSV Files in Stage
Query CSV Files in Stage
Syntax:
SELECT [<alias>.]$<col_position> [, $<col_position> ...]
FROM {@<stage_name>[/<path>] [<table_alias>] | '<uri>' [<table_alias>]}
[(
[<connection_parameters>],
[ PATTERN => '<regex_pattern>'],
[ FILE_FORMAT => 'CSV| <custom_format_name>'],
[ FILES => ( '<file_name>' [ , '<file_name>' ] [ , ... ] ) ]
)]
Tips
CSV doesn't have schema information, so we can only query the columns $<col_position> [, $<col_position> ...]
by position.
Tutorial
Step 1. Create an External Stage
Create an external stage with your own S3 bucket and credentials where your CSV files are stored.
CREATE STAGE csv_query_stage
URL = 's3://load/csv/'
CONNECTION = (
ACCESS_KEY_ID = '<your-access-key-id>'
SECRET_ACCESS_KEY = '<your-secret-access-key>'
);
Step 2. Create Custom CSV File Format
CREATE FILE FORMAT csv_query_format
TYPE = CSV,
RECORD_DELIMITER = '\n',
FIELD_DELIMITER = ',',
COMPRESSION = AUTO,
SKIP_HEADER = 1; -- Skip first line when querying if the CSV file has header
- More CSV file format options refer to CSV File Format Options
Step 3. Query CSV Files
SELECT $1, $2, $3
FROM @csv_query_stage
(
FILE_FORMAT => 'csv_query_format',
PATTERN => '.*[.]csv'
);
If the CSV files is compressed with gzip, we can use the following query:
SELECT $1, $2, $3
FROM @csv_query_stage
(
FILE_FORMAT => 'csv_query_format',
PATTERN => '.*[.]csv[.]gz'
);