MindsDB
Data that lives in your database is a valuable asset. MindsDB enables you to use your data and make forecasts. It speeds up the ML development process by bringing machine learning into the database. With MindsDB, you can build, train, optimize, and deploy your ML models without the need for other platforms.
Both Databend and Databend Cloud can integrate with MindsDB as a data source, which brings Machine Learning capabilities into Databend. The following tutorials show you how to integrate with MindsDB and make data forecasts, using the Air Pollution in Seoul dataset as an example.
Tutorial-1: Integrating Databend with MindsDB
Before you start, install a local MindsDB or sign up an account for MindsDB Cloud. This tutorial uses MindsDB Cloud. For more information about how to install a local MindsDB, refer to https://docs.mindsdb.com/quickstart#1-create-a-mindsdb-cloud-account-or-install-mindsdb-locally
Step 1. Load Dataset into Databend
Run the following SQL statements to create a table in the database default
and load the Air Pollution in Seoul dataset using the COPY INTO command:
CREATE TABLE pollution_measurement(
MeasurementDate Timestamp,
StationCode String,
Address String,
Latitude double,
Longitude double,
SO2 double,
NO2 double,
O3 double,
CO double,
PM10 double,
PM25 double
);
COPY INTO pollution_measurement FROM 'https://datasets.databend.org/AirPolutionSeoul/Measurement_summary.csv' file_format=(type='CSV' skip_header=1);
Step 2. Connect MindsDB to Databend
- Copy and paste the following SQL statements to the MindsDB Cloud Editor, and click Run:
CREATE DATABASE databend_datasource
WITH engine='databend',
parameters={
"protocol": "https",
"user": "<YOUR-USERNAME>",
"port": 8000,
"password": "<YOUR-PASSWORD>",
"host": "<YOUR-HOST>",
"database": "default"
};
The SQL statements above connect the database default
in Databend to your MindsDB Cloud account. For explanations about the parameters, refer to https://docs.mindsdb.com/data-integrations/all-data-integrations#databend
- In the MindsDB Cloud Editor, run the following SQL statements to verify the integration:
SELECT * FROM databend_datasource.pollution_measurement LIMIT 10;
Step 3. Create a Predictor
In the MindsDB Cloud Editor, run the following SQL statements to create a predictor:
CREATE PREDICTOR airq_predictor
FROM databend_datasource (SELECT * FROM pollution_measurement LIMIT 50)
PREDICT so2;
Now the predictor will begin training. You can check the status with the following query:
SELECT *
FROM mindsdb.models
WHERE name='airq_predictor';
The status of the model must be complete
before you can start making predictions.
Step 4. Make Predictions
In the MindsDB Cloud Editor, run the following SQL statements to predict the concentration of SO2:
SELECT
SO2 AS predicted,
SO2_confidence AS confidence,
SO2_explain AS info
FROM mindsdb.airq_predictor
WHERE (NO2 = 0.005)
AND (CO = 1.2)
AND (PM10 = 5)
Output:
Tutorial-2: Integrating Databend Cloud with MindsDB
Before you start, install a local MindsDB or sign up an account for MindsDB Cloud. This tutorial uses MindsDB Cloud. For more information about how to install a local MindsDB, refer to https://docs.mindsdb.com/quickstart#1-create-a-mindsdb-cloud-account-or-install-mindsdb-locally
Step 1. Load Dataset into Databend Cloud
Open a worksheet in Databend Cloud, and run the following SQL statements to create a table in the database default
and load the Air Pollution in Seoul dataset using the COPY INTO command:
CREATE TABLE pollution_measurement(
MeasurementDate Timestamp,
StationCode String,
Address String,
Latitude double,
Longitude double,
SO2 double,
NO2 double,
O3 double,
CO double,
PM10 double,
PM25 double
);
COPY INTO pollution_measurement FROM 'https://repo.databend.com/AirPolutionSeoul/Measurement_summary.csv' file_format=(type='CSV' skip_header=1);
Step 2. Connect MindsDB to Databend Cloud
- Copy and paste the following SQL statements to the MindsDB Cloud Editor, and click Run:
CREATE DATABASE databend_datasource
WITH engine='databend',
parameters={
"protocol": "https",
"user": "cloudapp",
"port": 443,
"password": "<YOUR-PASSWORD>",
"host": "<YOUR-HOST>",
"database": "default"
};
The SQL statements above connect the database default
in Databend Cloud to your MindsDB Cloud account. The parameter values can be obtained from the connection information of your warehouse. For more information, see Connecting to a Warehouse. For explanations about the parameters, refer to https://docs.mindsdb.com/data-integrations/all-data-integrations#databend
- In the MindsDB Cloud Editor, run the following SQL statements to verify the integration:
SELECT * FROM databend_datasource.pollution_measurement LIMIT 10;
Step 3. Create a Predictor
In the MindsDB Cloud Editor, run the following SQL statements to create a predictor:
CREATE PREDICTOR airq_predictor
FROM databend_datasource (SELECT * FROM pollution_measurement LIMIT 50)
PREDICT so2;
Now the predictor will begin training. You can check the status with the following query:
SELECT *
FROM mindsdb.models
WHERE name='airq_predictor';
The status of the model must be complete
before you can start making predictions.
Step 4. Make Predictions
In the MindsDB Cloud Editor, run the following SQL statements to predict the concentration of SO2:
SELECT
SO2 AS predicted,
SO2_confidence AS confidence,
SO2_explain AS info
FROM mindsdb.airq_predictor
WHERE (NO2 = 0.005)
AND (CO = 1.2)
AND (PM10 = 5)
Output: