Skip to main content

Loading Semi-structured Data

What is Semi-structured Data?

Semi-structured data is a form of data that does not conform to a rigid structure like traditional databases but still contains tags or markers to separate semantic elements and enforce hierarchies of records and fields.

Databend facilitates the efficient and user-friendly loading of semi-structured data. It supports various formats such as Parquet, CSV, TSV, and NDJSON.

Additionally, Databend allows for on-the-fly transformation of data during the loading process. Copy from semi-structured data format is the most common way to load data into Databend, it is very efficient and easy to use.

Supported Formats

Databend supports several semi-structured data formats loaded using the COPY INTO command:

  • Parquet: A columnar storage format, ideal for optimizing data storage and retrieval. It is best suited for complex data structures and offers efficient data compression and encoding schemes.

  • CSV (Comma-Separated Values): A simple format that is widely used for data exchange. CSV files are easy to read and write but might not be ideal for complex hierarchical data structures.

  • TSV (Tab-Separated Values): Similar to CSV, but uses tabs as delimiters. It's often used for data with simple structures that require a delimiter other than a comma.

  • NDJSON (Newline Delimited JSON): This format represents JSON data with each JSON object separated by a newline. It is particularly useful for streaming large datasets and handling data that changes frequently. NDJSON facilitates the processing of large volumes of data by breaking it down into manageable, line-delimited chunks.

For detailed instructions on how to load semi-structured data, check out the following topics:

Did this page help you?
Explore Databend Cloud for FREE
Fast Analytics
Easy Data Ingestion
Elastic Scaling
Try it today