Deploying with HDFS
Databend also works with Hadoop Distributed File System (HDFS). This topic explains how to deploy Databend with HDFS. For a list of other supported object storage solutions, see Understanding Deployment Modes.
Setting up Your HDFS
- HDFS
- WebHDFS
Before deploying Databend, make sure you have successfully set up your Hadoop environment, and completed the following tasks:
- Your system already has a Java SDK installed with JVM support.
- Get the name node URL for connecting to HDFS.
- You have already downloaded the Hadoop release to your system, and you can access the JAR packages in the release.
When using HDFS as the storage backend, ensure to set the following environment variables:
export JAVA_HOME=/path/to/java
export LD_LIBRARY_PATH=${JAVA_HOME}/lib/server:${LD_LIBRARY_PATH}
export HADOOP_HOME=/path/to/hadoop
export CLASSPATH=/all/hadoop/jar/files
The following is an example:
export JAVA_HOME=/usr/lib/jvm/java-21-jdk
export LD_LIBRARY_PATH={$JAVA_HOME}/lib/server/
export HADOOP_HOME={$HOME}/hadoop-3.3.6
export CLASSPATH=$(find $HADOOP_HOME -iname "*.jar" | xargs echo | tr ' ' ':')
Downloading Databend
a. Create a folder named databend
in the directory /usr/local
.
b. Download and extract the latest Databend release for your platform from GitHub Release:
To use HDFS as the storage backend, download a release with a file name formatted as databend-hdfs-${version}-${target-platform}.tar.gz
.
- Linux(x86)
curl -LJO https://repo.databend.com/databend/v1.2.680-p3/databend-hdfs-v1.2.680-p3-x86_64-unknown-linux-gnu.tar.gz
tar xzvf databend-hdfs-v1.2.680-p3-x86_64-unknown-linux-gnu.tar.gz
c. Move the extracted folders bin
, configs
, and scripts
to the folder /usr/local/databend
.
Before deploying Databend, make sure you have successfully set up your Hadoop environment, and the following tasks have been completed:
- Enable the WebHDFS support on Hadoop.
- Get the endpoint URL for connecting to WebHDFS.
- Get the delegation token used for authentication (if needed).
For information about how to enable and manage WebHDFS on Apache Hadoop, please refer to the manual of WebHDFS. Here are some links you may find useful:
Downloading Databend
a. Create a folder named databend
in the directory /usr/local
.
b. Download and extract the latest Databend release for your platform from GitHub Release:
- Linux(x86)
- Linux(Arm)
curl -LJO https://repo.databend.com/databend/v1.2.680-p3/databend-v1.2.680-p3-x86_64-unknown-linux-musl.tar.gz
tar xzvf databend-v1.2.680-p3-x86_64-unknown-linux-musl.tar.gz
curl -LJO https://repo.databend.com/databend/v1.2.680-p3/databend-v1.2.680-p3-aarch64-unknown-linux-musl.tar.gz
tar xzvf databend-v1.2.680-p3-aarch64-unknown-linux-musl.tar.gz
c. Move the extracted folders bin
, configs
, and scripts
to the folder /usr/local/databend
.
Deploying a Meta Node
a. Open the file databend-meta.toml
in the folder /usr/local/databend/configs
, and replace 127.0.0.1
with 0.0.0.0
within the whole file.
b. Open a terminal window and navigate to the folder /usr/local/databend/bin
.
c. Run the following command to start the Meta node:
./databend-meta -c ../configs/databend-meta.toml > meta.log 2>&1 &
d. Run the following command to check if the Meta node was started successfully:
curl -I http://127.0.0.1:28101/v1/health
Deploying a Query Node
a. Locate the file databend-query.toml
in the folder /usr/local/databend/configs
.
b. In the file databend-query.toml
, set the parameter type in the [storage] block and configure the access credentials and endpoint URL for connecting to your HDFS.
To configure your storage settings, please comment out the [storage.fs] section by adding '#' at the beginning of each line, and then uncomment the appropriate section for your HDFS provider by removing the '#' symbol, and fill in the necessary values. You can copy and paste the corresponding template below to the file and configure it accordingly.
- HDFS
- WebHDFS
[storage]
type = "hdfs"
[storage.hdfs]
name_node = "hdfs://hadoop.example.com:8020"
root = "/analyses/databend/storage"
[storage]
type = "webhdfs"
[storage.webhdfs]
endpoint_url = "https://hadoop.example.com:9870"
root = "/analyses/databend/storage"
# if your webhdfs needs authentication, uncomment and set with your value
# delegation = "<delegation-token>"
c. Configure an admin user with the [query.users] sections. For more information, see Configuring Admin Users. To proceed with the default root user and the authentication type "no_password", ensure that you remove the '#' character before the following lines in the file databend-query.toml
:
Using "no_password" authentication for the root user in this tutorial is just an example and not recommended for production due to potential security risks.
...
[[query.users]]
name = "root"
auth_type = "no_password"
...
d. Open a terminal window and navigate to the folder /usr/local/databend/bin
.
e. Run the following command to start the Query node:
./databend-query -c ../configs/databend-query.toml > query.log 2>&1 &
f. Run the following command to check if the Query node was started successfully:
curl -I http://127.0.0.1:8080/v1/health
Verifying Deployment
In this section, we will run a simple query against Databend using BendSQL to verify the deployment.
a. Follow Installing BendSQL to install BendSQL on your machine.
b. Launch BendSQL and retrieve the current time for verification.
Starting and Stopping Databend
Each time you start and stop Databend, simply run the scripts in the folder /usr/local/databend/scripts
:
# Start Databend
./scripts/start.sh
# Stop Databend
./scripts/stop.sh
Permission denied?
If you encounter the subsequent error messages while attempting to start Databend:
==> query.log <==
: No getcpu support: percpu_arena:percpu
: option background_thread currently supports pthread only
Databend Query start failure, cause: Code: 1104, Text = failed to create appender: Os { code: 13, kind: PermissionDenied, message: "Permission denied" }.
Run the following commands and try starting Databend again:
sudo mkdir /var/log/databend
sudo mkdir /var/lib/databend
sudo chown -R $USER /var/log/databend
sudo chown -R $USER /var/lib/databend
Next Steps
After deploying Databend, you might need to learn about the following topics:
- Load & Unload Data: Manage data import/export in Databend.
- Visualize: Integrate Databend with visualization tools for insights.