Introduction
MindsDB is amazing in what it does as it embeds an AI layer on top of the existing traditional databases making them smarter and enabling them to create robust Predictor models with the data they have, without really worrying about the underlying coding and all.
MindsDB also makes the user interaction really easy as one just needs to use some simple SQL queries to get things done here. Currently, MindsDB offers both a free and paid version for all of its users and is available in two variants i.e., Self-hosted deployment and MindsDB Cloud.
In this tutorial we will be predicting the quality of water based on several feature parameters using a dataset on MindsDB Cloud.
Feeding Data to MindsDB Cloud
In order to feed the dataset to MindsDB Cloud, we need to first download it from any freely available sources like Kaggle and then upload it simply to MindsDB using the steps mentioned below.
Step 1: Let'ssign in to the MindsDB Cloud console or simplyregister for a new account.
Step 2: Once you're logged in, you will find the MindsDB Cloud Editor. The top portion is simply aQuery Editor where we can write the queries and execute them, the bottom comprises of theResult Viewer where we can see the results of the executed queries and the right panel contains theLearning Hub for anyone who is just getting started with MindsDB.
Step 3: Now hit theAdd Data
button from the top right corner and click on theFiles
tab instead ofDatabases
followed by clicking on theImport File
button.
Step 4: In this step we will simply upload the.CSV
file that we can easily download from Kagglehere. Now we just have to provide a table name in theTable name
field and then click onSave and Continue
to import the file as a table in the given table name to MindsDB Cloud.
Step 5: Upon successful import of the table, the control takes us back to the Cloud Editor page where we can see two simple SQL queries listed to list the names of available tables and then check the data in the current table that we just imported.
Let's execute the first command and check the list of available. We should be able to find a table namedWaterQL
which confirms that the current table is present in the database.
SHOWTABLESFROMfiles;
Now let's execute the second query and check whether we have some data rows present in our table or now. This query should return 10 data rows.
SELECT*FROMfiles.WaterQLLIMIT10;
This confirms that we are ready with the data table now. Let's proceed to the next part where we will train a Predictor model using this data.
Training a Predictor Model
MindsDB provides very simple SQL queries to carry out different tasks in its interface. So, we will now proceed with the steps below to get ready with the Predictor model.
Step 1: MindsDB provides aCREATE PREDICTOR
syntax that we can use to train the model. Follow the syntax below.
CREATEPREDICTORmindsdb.predictor_name(YourPredictorName)FROMdatabase_name(YourDatabaseName)(SELECTcolumnsFROMtable_nameLIMIT10000)(YourTableName)PREDICTtarget_parameter;(YourTargetParameter)
Simply, replace the names with the ones you want to use for your Predictor and the respective table names and you should be good to go with this. For e.g., the actual query for me looks something like this.
CREATEPREDICTORmindsdb.water_qualityFROMfiles(SELECT*FROMWaterQLLIMIT10000)PREDICTPotability;
Step 2: Based on the size of the dataset used, it might take a while for the model to complete its training. We can check the training status of the model using the following statement.
SELECTstatusFROMmindsdb.predictorsWHEREname='Name_of_the_Predictor';
The actual query will be formed like this putting the name of the model in place of the placeholder above.
SELECTstatusFROMmindsdb.predictorsWHEREname='water_quality';
As the status returned iscomplete
, we are now ready to do the predictions for water quality.
Note: There are 3 possible statuses for the model in the following sequence.
- generating--> It means the model is getting generated currently
- training--> It means that the model is now getting trained with the dataset.
- complete--> It means the model is now ready to do the predictions.
Describing the Predictor Model
Before we proceed to the final part of predicting the water quality, let us first understand the underlying model that we just trained.
MindsDB provides the following 3 types of descriptions for the model using theDESCRIBE
statement.
- By Features
- By Model
- By Model Ensemble
By Features
DESCRIBEmindsdb.predictor_model_name.features;
This query shows the role of each column for the model along with the type of encoders used on them while training.
By Model
DESCRIBEmindsdb.predictor_model_name.model;
This query shows the list of all the underlying candidate models that are used during training. The one with thebest performance
i.e., whose value is closer to 1, is selected. You can see the value1
for the selected one in theselected
column while others are set at0
.
By Model Ensemble
DESCRIBEmindsdb.predictor_model_name.ensemble;
This query gives back a JSON output that contains the different parameters that ultimately helped to choose the best candidate model for the Predictor.
As we are done now understanding our Predictor model, let's move on to prediciting values in the next section.
Predicting the Target Value
Predicitng the water quality(Potability) is as easy as running a simple SELECT statement using the Predictor.
As water quality depends on many feature parameters, it is advised to do the prediction providing all the required feature parameter values for an accurate prediction. However, we can still go ahead and choose doing this by passing a few of them.
The query for this will be as follows.
SELECTtarget_value_name,target_value_confidence,target_value_confidenceFROMmindsdb.predictor_nameWHEREfeature1=value1ANDfeature2=value2,...;
Now, replacing the placeholders in the above query, the actual query will be like this.
SELECTPotability,Potability_confidence,Potability_explainFROMmindsdb.water_qualityWHEREph=2.6ANDHardness=210ANDSolids=18645.233ANDChloramines=6.546;
As the predictedPotability
(Water Quality) is 0, this water is not safe for human consumption.
We will now pass all the required feature parameters to obtain a more accurate prediction of the water quality. So, the query now becomes something like this.
SELECTPotability,Potability_confidence,Potability_explainFROMmindsdb.water_qualityWHEREph=6.9ANDHardness=201ANDSolids=11350.675ANDChloramines=4.3ANDSulfate=NULLANDConductivity=467.5ANDOrganic_carbon=9.98ANDTrihalomethanes=89.686ANDTurbidity=4.99;
As the predictedPotability
(Water Quality) is 1, this water is safe for human consumption.
Kudos! We have now successfully predicted the water quality using a Predictor.
Note: While predicting we supplied three parameters with the SELECT statement.
- target_parameter: This returns the value we want to predict.
- target_parameter_confidence: This returns how confident the model is about the Prediction.
- target_parameter_explain: This returns all the details about the predicted
target_value
i.e., the value of the target predicted, the confidence level, anomalies, if any, truth value, etc.
Conclusion
This concludes the tutorial here. Before we wrap this up, let's do a quick recap of what we did here. We first started with creating a MindsDB Cloud account, fed the dataset and created a table using the cloud UI, trained a Predictor model, described its model features and finally predicted the target water quality value.
MindsDB is really simple, easy-to-use and free to all of its users. So, I would suggest all of you to pick up any random dataset from the internet and start predicting values out of it using your own MindsDB Predictors.
Lastly, before you leave, I would love to know your feedback in theComments
section below and would be really motivated if you drop aLIKE
on this article.
Top comments(1)

MindsDB simplifies predictive modeling by embedding AI into traditional databases. The tutorial demonstrates easy data feeding, model training, and prediction using SQL queries. MindsDB's simplicity and free access make it accessible for anyone. It's a great tool for experimenting with datasets and generating predictions effortlessly. I'm gonna share it on mywatersoftenersizecalculator.com blog. Appreciated!
For further actions, you may consider blocking this person and/orreporting abuse