Skip to main content

Connection Parameters

Introduced or updated: v1.2.294

The connection parameters refer to a set of essential connection details required for establishing a secure link to supported external storage services, like Amazon S3. These parameters are enclosed within parentheses and consists of key-value pairs separated by commas. It is commonly utilized in operations such as creating a stage, copying data into Databend, and querying staged files from external sources. The provided key-value pairs offer the necessary authentication and configuration information for the connection.

Syntax and Examples

The connection parameters are specified using a CONNECTION clause and are separated by comma. When Querying Staged Files, the CONNECTION clause is enclosed in an additional set of parentheses.

Examples:
-- This example illustrates a 'CREATE STAGE' command where 'CONNECTION' is followed by '=', establishing a Minio stage with specific connection parameters.
CREATE STAGE my_minio_stage
's3://databend'
CONNECTION = (
ENDPOINT_URL = 'http://localhost:9000',
ACCESS_KEY_ID = 'ROOTUSER',
SECRET_ACCESS_KEY = 'CHANGEME123'
);

-- This example showcases a 'COPY INTO' command, employing '=' after 'CONNECTION' to copy data, while also specifying file format details.
COPY INTO mytable
FROM 's3://mybucket/data.csv'
CONNECTION = (
ACCESS_KEY_ID = '<your-access-key-ID>',
SECRET_ACCESS_KEY = '<your-secret-access-key>'
)
FILE_FORMAT = (
TYPE = CSV,
FIELD_DELIMITER = ',',
RECORD_DELIMITER = '\n',
SKIP_HEADER = 1
)
SIZE_LIMIT = 10;

-- This example uses a 'SELECT' statement to query staged files.
-- 'CONNECTION' is followed by '=>' to access Minio data, and the connection clause is enclosed in an additional set of parentheses.
SELECT * FROM 's3://testbucket/admin/data/parquet/tuple.parquet'
(CONNECTION => (
ACCESS_KEY_ID = 'minioadmin',
SECRET_ACCESS_KEY = 'minioadmin',
ENDPOINT_URL = 'http://127.0.0.1:9900/'
)
);

The connection parameters vary for different storage services based on their specific requirements and authentication mechanisms. For more information, please refer to the tables below.

Amazon S3-like Storage Services

The following table lists connection parameters for accessing an Amazon S3-like storage service:

ParameterRequired?Description
endpoint_urlYesEndpoint URL for Amazon S3-like storage service.
access_key_idYesAccess key ID for identifying the requester.
secret_access_keyYesSecret access key for authentication.
enable_virtual_host_styleNoWhether to use virtual host-style URLs. Defaults to false.
master_keyNoOptional master key for advanced data encryption.
regionNoAWS region where the bucket is located.
security_tokenNoSecurity token for temporary credentials.
note
  • If the endpoint_url parameter is not specified in the command, Databend will create the stage on Amazon S3 by default. Therefore, when you create an external stage on an S3-compatible object storage or other object storage solutions, be sure to include the endpoint_url parameter.

  • The region parameter is not required because Databend can automatically detect the region information. You typically don't need to manually specify a value for this parameter. In case automatic detection fails, Databend will default to using 'us-east-1' as the region. When deploying Databend with MinIO and not configuring the region information, it will automatically default to using 'us-east-1', and this will work correctly. However, if you receive error messages such as "region is missing" or "The bucket you are trying to access requires a specific endpoint. Please direct all future requests to this particular endpoint", you need to determine your region name and explicitly assign it to the region parameter.

To access your Amazon S3 buckets, you can also specify an AWS IAM role and external ID for authentication. By specifying an AWS IAM role and external ID, you can provide more granular control over which S3 buckets a user can access. This means that if the IAM role has been granted permissions to access only specific S3 buckets, then the user will only be able to access those buckets. An external ID can further enhance security by providing an additional layer of verification. For more information, see https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-role.html

The following table lists connection parameters for accessing Amazon S3 storage service using AWS IAM role authentication:

ParameterRequired?Description
endpoint_urlNoEndpoint URL for Amazon S3.
role_arnYesARN of the AWS IAM role for authorization to S3.
external_idNoExternal ID for enhanced security in role assumption.

Azure Blob Storage

The following table lists connection parameters for accessing Azure Blob Storage:

ParameterRequired?Description
endpoint_urlYesEndpoint URL for Azure Blob Storage.
account_keyYesAzure Blob Storage account key for authentication.
account_nameYesAzure Blob Storage account name for identification.

Google Cloud Storage

The following table lists connection parameters for accessing Google Cloud Storage:

ParameterRequired?Description
endpoint_urlYesEndpoint URL for Google Cloud Storage.
credentialYesGoogle Cloud Storage credential for authentication.

Alibaba Cloud OSS

The following table lists connection parameters for accessing Alibaba Cloud OSS:

ParameterRequired?Description
access_key_idYesAlibaba Cloud OSS access key ID for authentication.
access_key_secretYesAlibaba Cloud OSS access key secret for authentication.
endpoint_urlYesEndpoint URL for Alibaba Cloud OSS.
presign_endpoint_urlNoEndpoint URL for presigning Alibaba Cloud OSS URLs.

Tencent Cloud Object Storage

The following table lists connection parameters for accessing Tencent Cloud Object Storage (COS):

ParameterRequired?Description
endpoint_urlYesEndpoint URL for Tencent Cloud Object Storage.
secret_idYesTencent Cloud Object Storage secret ID for authentication.
secret_keyYesTencent Cloud Object Storage secret key for authentication.

HDFS

The following table lists connection parameters for accessing Hadoop Distributed File System (HDFS):

ParameterRequired?Description
name_nodeYesHDFS NameNode address for connecting to the cluster.

WebHDFS

The following table lists connection parameters for accessing WebHDFS:

ParameterRequired?Description
endpoint_urlYesEndpoint URL for WebHDFS.
delegationNoDelegation token for accessing WebHDFS.

Hugging Face

The following table lists connection parameters for accessing Hugging Face:

ParameterRequired?Description
repo_idYesThe identifier for the Hugging Face repository. For example, "opendal/huggingface-testdata". Please note that the repo_id must have an organization name; datasets (such as https://huggingface.co/datasets/ropes) stored in a non-organizational format on the Hugging Face, are not supported at this time.
repo_typeNo (default: dataset)The type of the Hugging Face repository. Can be dataset or model.
revisionNo (default: main)The revision for the Hugging Face URI. Could be a branch, tag, or commit of the repository.
tokenNoThe API token from Hugging Face, which may be required for accessing private repositories or certain resources.