Transforming Data on Load

Databend's COPY INTO command allows data transformation during the loading process. This streamlines your ETL pipeline by incorporating basic transformations, eliminating the need for temporary tables.

See Querying & Transforming for syntax.

Key transformations you can perform:

Loading a subset of data columns: Selectively import specific columns.
Reordering columns: Change column order during load.
Converting datatypes: Ensure consistency and compatibility.
Performing arithmetic operations: Generate new derived data.
Loading data to a table with additional columns: Map and insert data into existing structures.

Tutorials

These tutorials demonstrate data transformation during loading. Each example shows loading from a staged file.

Before You Begin

Create a stage and generate a sample Parquet file:

CREATE STAGE my_parquet_stage;
COPY INTO @my_parquet_stage
FROM (
    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS id,
           'Name_' || CAST(number AS VARCHAR) AS name,
           20 + MOD(number, 23) AS age,
           DATE_ADD('day', MOD(number, 60), '2022-01-01') AS onboarded
    FROM numbers(10)
)
FILE_FORMAT = (TYPE = PARQUET);

Query the staged sample file:

SELECT * FROM @my_parquet_stage;

Result:

┌───────────────────────────────────────┐
│   id   │  name  │   age  │  onboarded │
├────────┼────────┼────────┼────────────┤
│      1 │ Name_0 │     20 │ 2022-01-01 │
│      2 │ Name_5 │     25 │ 2022-01-06 │
│      3 │ Name_1 │     21 │ 2022-01-02 │
│      4 │ Name_6 │     26 │ 2022-01-07 │
│      5 │ Name_7 │     27 │ 2022-01-08 │
│      6 │ Name_2 │     22 │ 2022-01-03 │
│      7 │ Name_8 │     28 │ 2022-01-09 │
│      8 │ Name_3 │     23 │ 2022-01-04 │
│      9 │ Name_4 │     24 │ 2022-01-05 │
│     10 │ Name_9 │     29 │ 2022-01-10 │
└───────────────────────────────────────┘

Tutorial 1 - Loading a Subset of Data Columns

Load data into a table with fewer columns than the source file (e.g., excluding 'age').

CREATE TABLE employees_no_age (
  id INT,
  name VARCHAR,
  onboarded timestamp
);

COPY INTO employees_no_age
FROM (
    SELECT t.id,
           t.name,
           t.onboarded
    FROM @my_parquet_stage t
)
FILE_FORMAT = (TYPE = PARQUET)
PATTERN = '.*parquet';

SELECT * FROM employees_no_age;

Result (first 3 rows):

┌──────────────────────────────────────────────────────────┐
│        id       │       name       │      onboarded      │
├─────────────────┼──────────────────┼─────────────────────┤
│               1 │ Name_0           │ 2022-01-01 00:00:00 │
│               2 │ Name_5           │ 2022-01-06 00:00:00 │
│               3 │ Name_1           │ 2022-01-02 00:00:00 │
└──────────────────────────────────────────────────────────┘

Tutorial 2 - Reordering Columns During Load

Load data into a table with columns in a different order (e.g., 'age' before 'name').

CREATE TABLE employees_new_order (
  id INT,
  age INT,
  name VARCHAR,
  onboarded timestamp
);

COPY INTO employees_new_order
FROM (
    SELECT
        t.id,
        t.age,
        t.name,
        t.onboarded
    FROM @my_parquet_stage t
)
FILE_FORMAT = (TYPE = PARQUET)
PATTERN = '.*parquet';

SELECT * FROM employees_new_order;

Result (first 3 rows):

┌────────────────────────────────────────────────────────────────────────────┐
│        id       │       age       │       name       │      onboarded      │
├─────────────────┼─────────────────┼──────────────────┼─────────────────────┤
│               1 │              20 │ Name_0           │ 2022-01-01 00:00:00 │
│               2 │              25 │ Name_5           │ 2022-01-06 00:00:00 │
│               3 │              21 │ Name_1           │ 2022-01-02 00:00:00 │
└────────────────────────────────────────────────────────────────────────────┘

Tutorial 3 - Converting Datatypes During Load

Load data and convert a column's datatype (e.g., 'onboarded' to DATE).

CREATE TABLE employees_date (
  id INT,
  name VARCHAR,
  age INT,
  onboarded date
);

COPY INTO employees_date
FROM (
    SELECT
        t.id,
        t.name,
        t.age,
        to_date(t.onboarded)
    FROM @my_parquet_stage t
)
FILE_FORMAT = (TYPE = PARQUET)
PATTERN = '.*parquet';

SELECT * FROM employees_date;

Result (first 3 rows):

┌───────────────────────────────────────────────────────────────────────┐
│        id       │       name       │       age       │    onboarded   │
├─────────────────┼──────────────────┼─────────────────┼────────────────┤
│               1 │ Name_0           │              20 │ 2022-01-01     │
│               2 │ Name_5           │              25 │ 2022-01-06     │
│               3 │ Name_1           │              21 │ 2022-01-02     │
└───────────────────────────────────────────────────────────────────────┘

Tutorial 4 - Performing Arithmetic Operations During Load

Load data and perform arithmetic operations (e.g., increment 'age' by 1).

CREATE TABLE employees_new_age (
  id INT,
  name VARCHAR,
  age INT,
  onboarded timestamp
);

COPY INTO employees_new_age
FROM (
    SELECT
        t.id,
        t.name,
        t.age + 1,
        t.onboarded
    FROM @my_parquet_stage t
)
FILE_FORMAT = (TYPE = PARQUET)
PATTERN = '.*parquet';

SELECT * FROM employees_new_age;

Result (first 3 rows):

┌────────────────────────────────────────────────────────────────────────────┐
│        id       │       name       │       age       │      onboarded      │
├─────────────────┼──────────────────┼─────────────────┼─────────────────────┤
│               1 │ Name_0           │              21 │ 2022-01-01 00:00:00 │
│               2 │ Name_5           │              26 │ 2022-01-06 00:00:00 │
│               3 │ Name_1           │              22 │ 2022-01-02 00:00:00 │
└────────────────────────────────────────────────────────────────────────────┘

Tutorial 5 - Loading to a Table with Additional Columns

Load data into a table that has more columns than the source file.

CREATE TABLE employees_plus (
  id INT,
  name VARCHAR,
  age INT,
  onboarded timestamp,
  lastday timestamp
);

COPY INTO employees_plus (id, name, age, onboarded)
FROM (
    SELECT
        t.id,
        t.name,
        t.age,
        t.onboarded
    FROM @my_parquet_stage t
)
FILE_FORMAT = (TYPE = PARQUET)
PATTERN = '.*parquet';

SELECT * FROM employees_plus;

Result (first 3 rows):

┌──────────────────────────────────────────────────────────────────────────────────────────────────┐
│        id       │       name       │       age       │      onboarded      │       lastday       │
├─────────────────┼──────────────────┼─────────────────┼─────────────────────┼─────────────────────┤
│               1 │ Name_0           │              20 │ 2022-01-01 00:00:00 │ NULL                │
│               2 │ Name_5           │              25 │ 2022-01-06 00:00:00 │ NULL                │
│               3 │ Name_1           │              21 │ 2022-01-02 00:00:00 │ NULL                │
└──────────────────────────────────────────────────────────────────────────────────────────────────┘

Transforming Data on Load

Tutorials

Before You Begin

Tutorial 1 - Loading a Subset of Data Columns

Tutorial 2 - Reordering Columns During Load

Tutorial 3 - Converting Datatypes During Load

Tutorial 4 - Performing Arithmetic Operations During Load

Tutorial 5 - Loading to a Table with Additional Columns

Join our growing community

GitHub

Slack

X(Twitter)

YouTube

Explore Databend Cloud for FREE

Tutorials​

Before You Begin​

Tutorial 1 - Loading a Subset of Data Columns​

Tutorial 2 - Reordering Columns During Load​

Tutorial 3 - Converting Datatypes During Load​

Tutorial 4 - Performing Arithmetic Operations During Load​

Tutorial 5 - Loading to a Table with Additional Columns​

Join our growing community

GitHub

Slack

X(Twitter)

YouTube

Explore Databend Cloud for FREE

Tutorials

Before You Begin

Tutorial 1 - Loading a Subset of Data Columns

Tutorial 2 - Reordering Columns During Load

Tutorial 3 - Converting Datatypes During Load

Tutorial 4 - Performing Arithmetic Operations During Load

Tutorial 5 - Loading to a Table with Additional Columns