SALE!Use codeBF40 for 40% off everything!
Hurry, sale ends soon!Click to see the full catalog.

Navigation

Making developers awesome at machine learning

Click to Take the FREE Weka Crash-Course

Weka Machine Learning Mini-Course

By Jason BrownleeonFebruary 2, 2021in Weka Machine Learning 105

Become A Machine Learning Practitioner in 14-Days

Machine learning is a fascinating study, but how do you actually use it on your own problems?

You may be confused as to how best prepare your data for machine learning, which algorithms to use or how to choose one model over another.

In this post you will discover a 14-part crash course into applied machine learning using the Weka platform without a single mathematical equation or line of programming code.

After completing this mini course:

You will know how to work through a dataset end-to-end and deliver a set of predictions or a high-performance model.
You will know your way around the Weka machine learning workbench including how to explore algorithms and design controlled experiments.
You will know how to create multiple views of your problem, evaluate multiple algorithms and use statistics to choose the best performing model for your own predictive modeling problems.

Kick-start your project with my new bookMachine Learning Mastery With Weka, includingstep-by-step tutorials and clearscreenshots for all examples.

Let’s get started.

(Tip: You might want to print or bookmark this page so that you can refer back to it later)

Applied Machine Learning With Weka Mini-Course
Photo byLeon Yaakov, some rights reserved.

Who Is This Mini-Course For?

Before we get started, let’s make sure you are in the right place. The list below provides some general guidelines as to who this course was designed for.

Don’t panic if you don’t match these points exactly, you might just need to brush up in one area or another to keep up.

You are a developer that knows a little machine learning.

This means you know about some of the basics of machine learning like cross validation, some algorithms and the bias-variance trade-off. It does not mean that you are a machine learning PhD, just that you know the landmarks or know where to look them up.

This mini-course is not a textbook on machine learning.

It will take you from a developer that knows a little machine learning to a developer who can use the Weka platform to work through a dataset from beginning to end and deliver a set of predictions or a high performance model.

Mini-Course Overview (what to expect)

This mini-course is divided into 14 parts.

Each lesson was designed to take you about 30 minutes. You might finish some much sooner and for others you may choose to go deeper and spend more time.

You can complete each part as quickly or as slowly as you like. A comfortable schedule may be to complete one lesson per day over a two week period. Highly recommended.

The topics you will cover over the next 14 lessons are as follows:

Lesson 01: Download and Install Weka.
Lesson 02: Load Standard Machine Learning Datasets.
Lesson 03: Descriptive Stats and Visualization.
Lesson 04: Rescale Your Data.
Lesson 05: Perform Feature Selection on Your Data.
Lesson 06: Machine Learning Algorithms in Weka.
Lesson 07: Estimate Model Performance.
Lesson 08: Baseline Performance On Your Data.
Lesson 09: Classification Algorithms.
Lesson 10: Regression Algorithms.
Lesson 11: Ensemble Algorithms.
Lesson 12: Compare the Performance of Algorithms.
Lesson 13: Tune Algorithm Parameters.
Lesson 14: Save Your Model.

This is going to be a lot of fun.

You’re going to have to do some work though, a little reading, a little tinkering in Weka. You want to get started in applied machine learning right?

(Tip:All of the answers these lessons can be found on this blog, use the search feature)

Any questions at all, please post in the comments below.

Share your results in the comments.

Hang in there, don’t give up!

Need more help with Weka for Machine Learning?

Take my free 14-day email course and discover how to use the platform step-by-step.

Click to sign-up and also get a free PDF Ebook version of the course.

Lesson 01: Download and Install Weka

The first thing to do is install the Weka software on your workstation.

Weka is free open source software. It is written in Java and can run on any platform that supports Java, including:

Windows.
Mac OS X.
Linux.

You can download Weka as standalone software or as a version bundled with Java.

If you do not already have Java installed on your system, I recommend downloading and installing a version bundled with Java.

Your task for this lesson is to visit theWeka download page, download and install Weka on your workstation.

Lesson 02: Load Standard Machine Learning Datasets

Now that you have Weka installed, you need to load data.

Weka is designed to load data in a native format called ARFF. It is a modified CSV format that includes additional information about the types of each attribute (column).

Your Weka installation includes a subdirectory with a number of standard machine learning datasets in ARFF format ready for you to load.

Weka also supports loading data from raw CSV files as well as a database and converts the data to ARFF as needed.

In this lesson you will load a standard dataset in the Weka Explorer.

Start Weka (click on the bird icon), this will start the Weka GUI Chooser.
Click the “Explorer” button, this will open the Weka Explorer interface.
Click the “Open file…” button and navigate to the data/ directory in your Weka installation and load the diabetes.arff dataset.

Note, if you do not have a data/ directory in your Weka installation, or you cannot find it, download the .zip version of Weka from theWeka download webpage, unzip it and access the data/ directory.

You have just loaded your first dataset in Weka.

Try loading some of the other datasets in the data/ directory.

Try downloading a raw CSV file from theUCI Machine Learning repository and loading it in Weka.

Lesson 03: Descriptive Stats and Visualization

Once you can load data in Weka, it is important to take a look at it.

Weka lets you review descriptive statistics calculated from your data. It also provides visualization tools.

In this lesson you will use Weka to learn more about your data.

Open the Weka GUI Chooser.
Open the Weka Explorer.
Load the data/diabetes.arff dataset.
Click on different attributes in the “Attributes” list and review the details in the “Selected attribute” pane.
Click the “Visualize All” button to review all attribute distributions.
Click the “Visualize” tab and review the scatter plot matrix for all attributes.

Get comfortable reviewing the details for different attributes in the “Preprocess” tab and tuning the scatter plot matrix in the “Visualize” tab.

Lesson 04: Rescale Your Data

Raw data is often not suitable for modeling.

Often you can improve the performance of your machine learning models by rescaling attributes.

In this lesson you will learn how to use data filters in Weka to rescale your data. You will normalize all of the attributes for a dataset, rescaling them to the consistent range of 0-to-1.

Open the Weka GUI Chooser and then the Weka Explorer.
Load the data/diabetes.arff dataset.
Click the “Choose” button in the “Filter” pane and select unsupervised.attribute.Normalize.
Click the “Apply” button.

Review the details for each attribute in the “Selected attribute” pane and note the change to the scale.

Explore using other data filters such as the Standardize filter.

Explore configuring filters by clicking on the name of the loaded filter and changing it’s parameters.

Test out saving modified datasets for later use by clicking the “Save…” button on the “Preprocess” tab.

Lesson 05: Perform Feature Selection on Your Data

Not all of the attributes in your dataset may be relevant to the attribute you want to predict.

You can use feature selection to identify those attributes that are most relevant to your output variable.

In this lesson you will get familiar with using different feature selection methods.

Open the Weka GUI Chooser and then the Weka Explorer.
Load the data/diabetes.arff dataset.
Click the “Select attributes” tab.
Click the “Choose” button in the “Attribute Evaluator” pane and select the “CorrelationAttributeEval”.
1. You will be presented with a dialog asking you to change to the “Ranker” search method, needed when using this feature selection method. Click the “Yes” button.
Click the “Start” button to run the feature selection method.

Review the output in the “Attribute selection output” pane and note the correlation scores for each attribute, the larger numbers indicating the more relevant features.

Explore other feature selection methods such as the use of information gain (entropy).

Explore selecting features to removal from your dataset in the “Preprocess” tab and the “Remove” button.

Lesson 06: Machine Learning Algorithms in Weka

A key benefit of the Weka workbench is the large number of machine learning algorithms it provides.

You need to know your way around machine learning algorithms.

In this lesson you will take a closer look at machine learning algorithms in Weka.

Open the Weka GUI Chooser and then the Weka Explorer.
Load the data/diabetes.arff dataset.
Click the “Classify” tab.
Click the “Choose” button and note the different groupings for algorithms.
Click the name of the selected algorithm to configure it.
Click the “More” button on the configuration window to learn more about the implementation.
Click the “Capabilities” button on the configuration window to learn more about how it can be used.
Note the “Open” and “Save” buttons on the window where different configurations can be saved and loaded.
Hover on a configuration parameter and note the tooltip help.
Click the “Start” button to run an algorithm.

Browse the algorithms available. Note that some algorithms are unavailable given whether your dataset is a classification (predict a category) or regression (predict a real value) type problem.

Explore and learn more about the various algorithms available in Weka.

Get confidence choosing and configuring algorithms.

Lesson 07: Estimate Model Performance

Now that you know how to choose and configure different algorithms, you need to know how to evaluate the performance of an algorithm.

In this lesson you are going to learn about the different ways to evaluate the performance of an algorithm in Weka.

Open the Weka GUI Chooser and then the Weka Explorer.
Load the data/diabetes.arff dataset.
Click the “Classify” tab.

The “Test options” pane lists the various different techniques that you can use to evaluate the performance of an algorithm.

The gold standard is 10-fold “Cross Validation”. This is selected by default. For a small dataset, the number of folds can be adjusted from 10 to 5 or even 3.
If your dataset is very large and you want to evaluate algorithms quickly, you can use the “Percentage split” option. By default, this option will train on 66% of your dataset and use the remaining 34% to evaluate the performance of your model.
Alternately, if you have a separate file containing a validation dataset, you can evaluate your model on that by selecting the “Supplied test set” option. Your model will be trained on the entire training dataset and evaluated on the separate dataset.
Finally, you can evaluate the performance of your model on the whole training dataset. This is useful if you are more interested in a descriptive than a predictive model.

Click the “Start” button to run a given algorithm with your chosen test option.

Experiment with different Test options.

Further refine the test options in the configuration provided by clicking the “More options…” button.

Lesson 08: Baseline Performance On Your Data

When you start evaluating multiple machine learning algorithms on your dataset, you need a baseline for comparison.

A baseline result gives you a point of reference to know whether the results for a given algorithm are good or bad, and by how much.

In this lesson you will learn about the ZeroR algorithm that you can use as a baseline for classification and regression algorithms.

Open the Weka GUI Chooser and then the Weka Explorer.
Load the data/diabetes.arff dataset.
Click the “Classify” tab. The ZeroR algorithm is chosen by default.
Click the “Start” button.

This will run the ZeroR algorithm using 10-fold cross validation on your dataset.

The ZeroR algorithm also called the Zero Rule is an algorithm that you can use to calculate a baseline of performance for all algorithms on your dataset. It is the “worst” result and any algorithm that shows a better performance has some skill on your problem.

On a classification algorithm, the ZeroR algorithm will always predict the most abundant category. If the dataset has an equal number of classes, it will predict the first category value.

On the diabetes dataset, this results in a classification accuracy of 65%.

For regression problems, the ZeroR algorithm will always predict the mean output value.

Experiment with the ZeroR algorithm on a range of different datasets. It is the algorithm you should always run first before all others to develop a baseline.

Lesson 09: Tour of Classification Algorithms

Weka provides a large number of classification algorithms.

In this lesson you will discover 5 top classification algorithms that you can use on your classification problems.

Open the Weka GUI Chooser and then the Weka Explorer.
Load the data/diabetes.arff dataset.
Click the “Classify” tab.
Click the “Choose” button.

5 Top algorithms that you can use for classification include:

Logistic Regression (functions.Logistic).
Naive Bayes (bayes.NaiveBayes).
k-Nearest Neighbors (lazy.IBk).
Classification and Regression Trees (trees.REPTree).
Support Vector Machines (functions.SMO).

Experiment with each of these top algorithms.

Try them out on different classification datasets, such as those with two classes and those with more.

Lesson 10: Tour of Regression Algorithms

Classification algorithms is Weka’s specialty, but many of these algorithms can be used for regression.

Regression is the prediction of a real valued outcome (like a dollar amount), different from classification that predicts a category (like “dog” or “cat”).

In this lesson you will discover 5 top regression algorithms that you can use on your regression problems.

You can download a suite of standard regression machine learning datasets from theWeka dataset download webpage. Download the datasets-numeric.jar archive of regression problems, titled:

“A jar file containing 37 regression problems, obtained from various sources”

Use your favorite unzip program to unzip the .jar file and you will have a new directory called numeric/ containing 37 regression problems that you can work with.

Open the Weka GUI Chooser and then the Weka Explorer.
Load the data/housing.arff dataset.
Click the “Classify” tab.
Click the “Choose” button.

5 Top algorithms that you can use for regression include:

Linear Regression (functions.LinearRegression).
Support Vector Regression (functions.SMOReg).
k-Nearest Neighbors (lazy.IBk).
Classification and Regression Trees (trees.REPTree).
Artificial Neural Network (functions.MultilayerPerceptron).

Experiment with each of these top algorithms.

Try them out on different regression datasets.

Lesson 11: Tour of Ensemble Algorithms

Weka is very easy to use and this may be its biggest advantage over other platforms.

In addition to this, Weka provides a large suite of ensemble machine learning algorithms and this may be Weka’s second big advantage over other platforms.

It is worth spending your time to get good at using Weka’s ensemble algorithms. In this lesson you will discover 5 top ensemble machine learning algorithms that you can use.

Open the Weka GUI Chooser and then the Weka Explorer.
Load the data/diabetes.arff dataset.
Click the “Classify” tab.
Click the “Choose” button.

5 Top ensemble algorithms that you can use include:

Bagging (meta.Bagging).
Random Forest (trees.RandomForest).
AdaBoost (meta.AdaBoost).
Voting (meta.Voting).
Stacking (meta.Stacking).

Experiment with each of these top algorithms.

Most of these ensemble methods let you choose the sub-models. Experiment using different combinations of sub-models. Combinations of techniques that work in very different ways and produce different predictions often result in better performance.

Try them out on different classification and regression datasets.

Lesson 12: Compare the Performance of Algorithms

Weka provides a different tool specifically designed for comparing algorithms called the Weka Experiment Environment.

The Weka Experiment Environment allows you to design and execute controlled experiments with machine learning algorithms and then analyze the results.

In this lesson you will design your first experiment in Weka and discover how to use the Weka Experiment Environment to compare the performance of machine learning algorithms.

Open the “Weka Chooser GUI”.
Click the “Experimenter” button to open the “Weka Experiment Environment”.
Click the “New” button.
Click the “Add new…” button in the “Datasets” pane and select “data/diabetes.arff”.
Click the “Add new…” button in the “Algorithms” pane and add “ZeroR” and “IBk”.
Click the “Run” tab and click the “Start” button.
Click the “Analyse” tab and click the “Experiment” button and then the “Perform test” button.

You just designed, executed and analysed the results of your first controlled experiment in Weka.

You compared the ZeroR algorithm to the IBk algorithm with default configuration on the diabetes dataset.

The results show that IBK has a higher classification accuracy than ZeroR and that this difference is statistically significant (the little “v” character next to the result).

Expand the experiment and add more algorithms and rerun the experiment.

Change the “Test base” on the “Analyse” tab to change which set of results is taken as the reference for comparison to the other results.

Lesson 13: Tune Algorithm Parameters

To get the most out of a machine learning algorithm you must tune the parameters of the method to your problem.

You cannot know how to best do this before hand, therefore you must try out lots of different parameters.

The Weka Experiment Environment allows you to design controlled experiments to compare the results of different algorithm parameters and whether the differences are statistically significant.

In this lesson you are going to design an experiment to compare the parameters of the k-Nearest Neighbors algorithm.

Open the “Weka Chooser GUI”.
Click the “Experimenter” button to open the “Weka Experiment Environment”
Click the “New” button.
Click the “Add new…” button in the “Datasets” pane and select “data/diabetes.arff”.
Click the “Add new…” button in the “Algorithms” pane and add 3 copes of the “IBk” algorithm.
Click each IBk algorithm in the list and click the “Edit selected…” button and change “KNN” to 1, 3, 5 for each of the 3 different algorithms.
Click the “Run” tab and click the “Start” button.
Click the “Analyse” tab and click the “Experiment” button and then the “Perform test” button.

You just designed, executed and analyzed the results of a controlled experiment to compare algorithm parameters.

We can see that the results for large K values is better than the default of 1 and the difference is significant.

Explore changing other configuration properties of KNN and build confidence in developing experiments to tune machine learning algorithms.

Lesson 14: Save Your Model

Once you have found a top performing model on your problem you need to finalize it for later use.

In this final lesson you will discover how to train a final model and save it to a file for later use.

Open the Weka GUI Chooser and then the Weka Explorer.
Load the data/diabetes.arff dataset.
Click the “Classify” tab.
Change the “Test options” to “Use training set” and click the “Start” button.
Right click on the results in the “Result list” and click “Save model” and enter a filename like “diabetes-final”.

You have just trained a final model on the entire training dataset and saved the resulting model to a file.

You can load this model back into Weka and use it to make predictions on new data.

Right-click on the “Result list” click “Load model” and select your model file (“diabetes-final.model”).
Change the “Test options” to “Supplied test set” and choose data/diabetes.arff (this could be a new file for which you do not have predictions)
Click “More options” in the “Test options” and change “Output predictions” to “Plain Text”
Right click on the loaded model and choose “Re-evaluate model on current test set”.

The new predictions will now be listed in the “Classifier output” pane.

Experiment saving different models and making predictions for entirely new datasets.

Machine Learning With Weka Mini-Course Review

Congratulations, you made it. Well done!

Take a moment and look back at how far you have come:

You discovered how to start and use the Weka Explorer and Weka Experiment Environment, perhaps for the first time.
You loaded data, analyzed it and used data filters and feature selection to prepare data for modeling.
You discovered a suite of machine learning algorithms and how to design controlled experiments to evaluate their performance.

Don’t make light of this, you have come a long way in a short amount of time. This is just the beginning of your journey in applied machine learning with Weka. Keep practicing and developing your skills.

Did you enjoy this mini-course? Do you have any questions or sticking points?
Leave a comment and let me know.

Discover Machine Learning Without The Code!

Develop Your Own Models in Minutes

...with just a few a few clicks

Discover how in my new Ebook:
Machine Learning Mastery With Weka

Coversself-study tutorials andend-to-end projects like:
Loading data,visualization,build models,tuning, and much more...

Finally Bring The Machine Learning To Your Own Projects

Skip the Academics. Just Results.

See What's Inside

105 Responses toWeka Machine Learning Mini-Course

RamanSeptember 24, 2016 at 1:18 am#
Lesson 7
I don’t see this option “CorrelationAttributeEval”
Reply
- Jason BrownleeSeptember 24, 2016 at 8:03 am#
  Perhaps it is not in your version of Weka? Ensure you have an up to date version.
  It looks like it has not been removed from Weka:
  http://wiki.pentaho.com/display/DATAMINING/CorrelationAttributeEval
  Reply
  - RamanSeptember 26, 2016 at 7:53 pm#
    Thank you, found it in a later version.
    Reply
    - Jason BrownleeSeptember 27, 2016 at 7:42 am#
      Gald to here it Raman.
      Reply
RamanSeptember 26, 2016 at 7:55 pm#
Lesson 10 and 11
I ran the steps in Lesson 10, but did not get the overall purpose.
Similar feedback on 11.
May be you have dedicated a separate discussion for those, I will look for them in your website.
Reply
- Jason BrownleeSeptember 27, 2016 at 7:43 am#
  Lessons 9, 10 and 11 are about exposure to different types of algorithms.
  Yes, I do cover each type of algorithm in more detail elsewhere on the site. Try the search feature for more info.
  Reply
MartinOctober 7, 2016 at 8:26 pm#
Excellent tutorial, I weka beginner. But this tutorial has clearly explain most of the features which I didn’t explore before. This help me easily pick up the tools. Thank you.
Reply
- Jason BrownleeOctober 8, 2016 at 10:35 am#
  You’re welcome, I’m glad it was useful Martin.
  Reply
RamanthOctober 7, 2016 at 10:31 pm#
can i use weka for making custom extractors..that can extract some keywords from a sentetnce
Reply
- Jason BrownleeOctober 8, 2016 at 10:35 am#
  Sorry, I’m not sure Ramanth.
  Reply
SidNovember 21, 2016 at 1:09 am#
Hi Jason! You say here that the website is suitable if “You are a developer that knows a little machine learning.”
Where can I start if I know absolutely nothing of the things that you’ve mentioned? I am doing a second major in Information Systems in University, so I know basic programming in the Java language.
Reply
- Jason BrownleeNovember 22, 2016 at 6:52 am#
  Hi Sid,
  A good place to get started is here:
  https://machinelearningmastery.com/start-here/
  This post is a great place to start:
  https://machinelearningmastery.com/basic-concepts-in-machine-learning/
  Reply
Sivakumar SubramaniamDecember 8, 2016 at 11:15 pm#
Hi Jason, it is good documentation and very useful. I could use Weka tool with sample data as guided.
But I am missing to understand / visualize what happens behind (how the data are analyzed by each classifiers), how to read and evaluate results.
Is there any case study which will help to understand deeper
Reply
- Jason BrownleeDecember 9, 2016 at 8:44 am#
  Yes, see this post:
  https://machinelearningmastery.com/binary-classification-tutorial-weka/
  Reply
MikaDecember 25, 2016 at 12:01 pm#
“Lesson 2: Note, if you do not have a data/ directory in your Weka installation, or you cannot find it, download the .zip version of Weka from the Weka download webpage, unzip it and access the data/ directory.”
I cant find the file in Weka GUI nor on the website? Not sure where to get the data. 🙁
Reply
- Jason BrownleeDecember 26, 2016 at 7:45 am#
  Hi Mika, there is more information for downloading and installing Weka and the data here:
  https://machinelearningmastery.com/download-install-weka-machine-learning-workbench/
  Reply
  - ViolaDecember 9, 2020 at 1:20 am#
    the download was good but the data cant be found. What to do? The data is needed to proceed..
    Reply
    - Jason BrownleeDecember 9, 2020 at 6:28 am#
      Here is the direct link for the datasets:
      https://raw.githubusercontent.com/jbrownlee/Datasets/master/weka-datasets.zip
      Reply
Sreedev RMay 22, 2017 at 3:51 am#
This is indeed a great tutorial. I am a beginner in Applied ML with so much of enthusiasm towards it. I was oblivious about how to start, After mad googling for some days I fortunately end up in this tutorial and it gave me a good kick start to this field. I have completed loading, analysing, making model and test model with this tutorial. I am planning to complete all the tutorials too. Keep making such simple tutorials and inspiring us. I was little scared about the ML, this tutorial also convinced me that ML is no rocket science, you just have the apatite to explore and learn. Thanks a lot Jason.
Reply
- Jason BrownleeMay 22, 2017 at 7:55 am#
  I’m so glad to hear that Sreedev, stick with it!
  Reply
DominikJuly 6, 2017 at 6:57 am#
Hi Jason,
first of all, thank you for sharing your knowledge in such an pleasant and mind-friendly way!
I have a question regarding Lesson 12: Compare the Performance of Algorithms.
See the following example: If I compare the performance of functions.Logistic and SMO on diabetes.arff (standard weka data) with all default settings, I’ll get:
functions.Logistic: 77.2135 % Correctly Classified Instances
bayes.NaiveBayes: 76.3021 % Correctly Classified Instances
If I use the experimenter as described in Lesson 12, I’ll get the following:
functions.Logistic: 77.47 % Correctly Classified Instances
bayes.NaiveBayes: 75.75 % Correctly Classified Instances
Why do I see different values here, shouldnt they are all the same, no matter where I compare the classification performance (experimenter or explorer)
Hope you can bring light to the darkness 🙂
Thanks,
Dominik
Reply
- Jason BrownleeJuly 6, 2017 at 10:27 am#
  The algorithms and their evaluation is stochastic. See this post:
  https://machinelearningmastery.com/randomness-in-machine-learning/
  Reply
  - DominikJuly 7, 2017 at 4:38 am#
    Awesome Jason, many thanks!
    All the best from Germany,
    Dominik
    Reply
    - Jason BrownleeJuly 9, 2017 at 10:32 am#
      Thanks.
      Reply
Upasana TiwariAugust 26, 2017 at 2:49 am#
Hi, Jason
Firstly Thanks for such a wonderful post .
I am using Weka 3.8.1 in windows7
While performing Feature selection , on loading .arff dataset of diabetes , CorrelationAttributeEval is available.
But when I am loading .csv file of same dataset , CorrelationAttributeEval is not available.
Reply
- Jason BrownleeAugust 26, 2017 at 6:47 am#
  Perhaps load your data and save it as ARFF before working with it?
  Reply
Dada GbengaOctober 6, 2017 at 5:53 pm#
Thanks for the tutorial. I want to know if the breast-cancer.arff dataset used on WEKA is the same as the Breast-cancer-Wisconsin dataset? Thank you.
Reply
- Jason BrownleeOctober 7, 2017 at 5:49 am#
  It may be, perhaps you can compare the data provided with Weka to the data on the UCI Machine Learning Repository.
  Reply
  - Dada GbengaOctober 9, 2017 at 7:05 am#
    Thanks.
    Reply
ChrisNovember 16, 2017 at 7:50 pm#
Hi Jason,
Thanks for this wonderful tutorial. I just started today.
I have one question.
In Lesson 5, when I reviewed the output in the “Attribute selection output” pane and note the correlation scores for each attribute, how can the correlation scores be calculated when I
didn’t identify the output that I’m going to predict. How can the scores be computed just by running the feature selection method? My understanding is that for this diabetes example, the ultimate goal is to classify whether a person will test positive or negative for diabetes using the selected attributes.
Reply
- Jason BrownleeNovember 17, 2017 at 9:25 am#
  I expect it is calculating pair-wise correlation, that is, finding variables that correlate highly with each other.
  Reply
RoaFebruary 24, 2018 at 11:33 pm#
HI Professor,
I have a question .
How can be imported dataset set predict sequence type(language, sentences, web pages, characters) to the weka ?
please help me.
Reply
- Jason BrownleeFebruary 25, 2018 at 7:44 am#
  Sorry, I have not worked with text in Weka, I cannot give you good advice.
  Reply
Pauli IsoahoMarch 15, 2018 at 4:57 am#
Excellent and fun guide!
I couldn’t get “new directory called numeric/” working, where it should be stored?
I put them in this folder
C:\Program Files\Weka-3-8\doc\weka\classifiers
as ‘numeric’ folder??
Reply
- Jason BrownleeMarch 15, 2018 at 6:36 am#
  This post might help:
  https://machinelearningmastery.com/download-install-weka-machine-learning-workbench/
  Reply
DeayoMarch 18, 2018 at 4:39 am#
The tutorial outline is awesome. Please can I use ANN for classification?
Reply
- Jason BrownleeMarch 18, 2018 at 6:07 am#
  THanks.
  Yes, it is called “multilayer perceptron” in weka.
  Reply
Dulaj ChathurangaApril 16, 2018 at 6:39 pm#
Just finished it. Thank you for help me to get into machine learning world.
Reply
- Jason BrownleeApril 17, 2018 at 5:55 am#
  Well done!
  Reply
RaamJune 22, 2018 at 8:42 am#
Hi Jason,
Thank you for creating this awesome website, to spread machine learning to the masses.
You mention in this blog that this course is for developers who know a little machine learning.
I am an undergraduate student in computer science who knows a little python and c++.
Is this course for me yet?
I do have a passion for learning though.
Reply
- Jason BrownleeJune 22, 2018 at 2:55 pm#
  Perfect!
  Reply
VictorOctober 3, 2018 at 9:50 pm#
Hi Jason
In lesson 5, where we learn the correlation of the attributes, the output is given as a ranking score, for example, 0.4666 2 plas, what does it mean please?
Does it mean “there is a correlation of 0.4666 between attribute 2 plas with xxxx”?
So what is this xxxx? is it the attribute “class”?
Does it mean that by default the last attribute in the dataset is the ‘output’ y?
Reply
- Jason BrownleeOctober 4, 2018 at 6:17 am#
  The variables are ranked based on their expected importance to the class variables. Perhaps some of the worse ranked variables can be removed.
  Reply
  - VictorOctober 4, 2018 at 8:02 pm#
    so Weka automatically understand that the last column in the data set is the output? Output as in the test/train data, the value that the model is trying to predict.
    Reply
    - Jason BrownleeOctober 5, 2018 at 5:33 am#
      Yes, but you can also change it in the Explorer on the left hand side.
      Reply
JamesDecember 5, 2018 at 1:49 pm#
Hi Jason ,
I love your workshop here and the way how you explain. Are you offering or in the future like a Certification in Machine learning , that would be great. you know like Udacity, or Coursera, Udemy. I like the way how you explain it. Its all about how some one is explaining it. Hope to hear from you .
Reply
- Jason BrownleeDecember 5, 2018 at 2:24 pm#
  Not at this stage, perhaps in the future.
  I do have Ebooks that teach my recommend approach here:
  https://machinelearningmastery.com/products/
  Reply
KevinDecember 6, 2018 at 2:59 am#
Hi Jason, I really appreciate these pages.
I was just wondering how to find descriptions of what the diabetes.arff variables actually are in Weka and what the outcome is that we are trying to predict i.e. we have the short names (such as preg, plas etc) but I couldn’t work out how to get more detailed descriptions, which would help me think about what the machine learning is actually being applied to here.
Thanks very much,
Kevin
Reply
- Jason BrownleeDecember 6, 2018 at 5:58 am#
  Sure:
  https://github.com/jbrownlee/Datasets/blob/master/pima-indians-diabetes.names
  Reply
MadeshwaranJanuary 23, 2019 at 4:23 am#
Hi Dr. Jason,
Excellent tutorial! One question though. How to get the model created in weka into production? any guidance?
Reply
- Jason BrownleeJanuary 23, 2019 at 8:53 am#
  Good question.
  This might help as a start:
  https://machinelearningmastery.com/save-machine-learning-model-make-predictions-weka/
  It might be a good idea to save the model and write some Java code using the Weka API to use the model to make predictions?
  Reply
hattab.mJanuary 29, 2019 at 8:03 pm#
Hi Dr,
I Work with weka for some time and and i need to know if can i introduce rules to the generated one when using apriori algorithm.
If not, can i affect weight to attributs which must to be taken into consideration when runninng the apriori algo?
Best regards
Reply
- Jason BrownleeJanuary 30, 2019 at 8:09 am#
  It might be possible via the Java API, sorry, I don’t have an example.
  Perhaps post the question on the Weka user group?
  Reply
ShedFebruary 25, 2019 at 10:58 am#
Hi Jason,
I have been on your website for the past few weeks and I have learned so much. I decided to start practising, I got a dataset that is mixed, both categorical and numerical attributes. I am not sure how to go about it. all the dataset we used here are numerical. what algorithm or and models do you suggest I start with?
Thanks
shed
Reply
- Jason BrownleeFebruary 25, 2019 at 2:17 pm#
  Sounds great.
  I recommend following this process:
  https://machinelearningmastery.com/start-here/#process
  Reply
jin luoApril 15, 2019 at 12:43 am#
Hi,Jason, when I try to save the model, it comes a pop saying save fail, the access is denied.
Reply
- Jason BrownleeApril 15, 2019 at 7:53 am#
  Perhaps you are trying to save in a location where you cannot write?
  Perhaps try saving in another directory?
  Reply
KivaDecember 18, 2019 at 2:51 am#
Hello Jason
I found the weka explorer to be useful. In feature selection, I am not able to select the start button when selecting CorrelationAttributeEval.
Reply
- Jason BrownleeDecember 18, 2019 at 6:10 am#
  Perhaps your data is not numerical?
  Reply
ManjunathaFebruary 23, 2020 at 1:41 pm#
Installing and running weka on macOS: While trying to run the weka app, I came across this message “macos cannot verify that this app is free from malware”. To overcome this, go to Sytem preferences->security and privacy->General tab. Select Allow app downloaded from App store and identified developers.
Reply
- Jason BrownleeFebruary 24, 2020 at 7:35 am#
  Yes.
  Reply
VicenteApril 5, 2020 at 8:14 pm#
Lesson 02: I had problems to find .arff files within the UCI Machine Learning repository. I opened around 20 datasets and I couldn’t find any file with this extension. There were .csv (I couldn’t open it) i.e. winequality-white.csv. Reason: “wrong number of values”.
I also realized that within the files there were many files, no just one. Some of them with .names extension, others as .data. These 2 extensiones were quite common.
Reply
- Jason BrownleeApril 6, 2020 at 6:04 am#
  Note the section that says:
  Note, if you do not have a data/ directory in your Weka installation, or you cannot find it, download the .zip version of Weka from the Weka download webpage, unzip it and access the data/ directory.
  Reply
JacobApril 23, 2020 at 6:06 pm#
Jason, when I am trying to load a csv file downloaded from UCI repository, I am getting an error-“Index1 out of bounds for length 1 problem encountered on line:2”
Reply
- Jason BrownleeApril 24, 2020 at 5:38 am#
  Perhaps check that the content of the file is truely csv format?
  Reply
Partha Sarathi MishraMay 19, 2020 at 6:45 pm#
One of the best author in machine learning
Reply
- Jason BrownleeMay 20, 2020 at 6:23 am#
  Thanks!
  Reply
Partha Sarathi Mishra, Jyotirekha Mishra, Amrutanshu MishraMay 19, 2020 at 6:46 pm#
how can find your learning instructions in weka!!!
Reply
- Jason BrownleeMay 20, 2020 at 6:23 am#
  This will help:
  https://machinelearningmastery.com/start-here/#weka
  Reply
KeshavJune 10, 2020 at 12:31 pm#
Firstly, Thanks for your courses it really very helpful for ML learning.
I have a doubt regrading ML diabetes model:
1. As per the Lesson 14,point 4:
Change the “Test options” to “Use training set” and click the “Start” button.
Does that mean we need to consider whole diabetes data as training data before saving the model?
2. Same lesson point 2:
Change the “Test options” to “Supplied test set” and choose data/diabetes.arff (this could be a new file for which you do not have predictions)
when you say this could be a new file so the new file must have same attributes as in training i.e. 9 including “class”?
If yes then how get to know which instance is predicted correctly as test set already have that attribute?
Kindly put some light on the final testing of the saved model?
And once again thanks a lot for your hardwork.
Reply
- Jason BrownleeJune 10, 2020 at 1:25 pm#
  Perhaps try splitting the data into two CSV prior to loading it.
  Reply
Skankarappa SridharaJune 19, 2020 at 7:34 pm#
Lesson No 1
Is very easy to install Weka by the link provided. I installed in a few minutes
Reply
- Jason BrownleeJune 20, 2020 at 6:09 am#
  Yes.
  Reply
George OhikhatemenJuly 7, 2020 at 1:29 am#
Sir, I have downloaded the weka, so very easy to do.
But there is so many features and split of screen.
Thank you for the new skills you about to impact on us
Reply
- Jason BrownleeJuly 7, 2020 at 6:41 am#
  These tutorials may also help:
  https://machinelearningmastery.com/start-here/#weka
  Reply
LynnSeptember 21, 2020 at 11:57 am#
Thank you, it is very useful.
Can Weka using for Deep learning?
Reply
- Jason BrownleeSeptember 21, 2020 at 2:36 pm#
  Perhaps, I have not tried sorry.
  Reply
lauren CNovember 23, 2020 at 12:55 am#
Hi Jason,
I am not sure how to open the raw csv data file once downloaded from the UCI repository. Guidance needed.I am familiar with opening the .arff files. I opted to try the wine data and the file type show wine.data but weka does not open it. please help.
Reply
- Jason BrownleeNovember 23, 2020 at 6:17 am#
  This tutorial will help:
  https://machinelearningmastery.com/load-csv-machine-learning-data-weka/
  Reply
JozefJanuary 25, 2021 at 10:04 pm#
Hi Jason.
Thank you for your course.
Please, where I can find a “Capabilities” button (Lesson 6/7)? It seems not to be in Classifier window.
Thanks.
Reply
NadiaJanuary 31, 2021 at 11:12 pm#
Hello Jason.
if you please answer me, i have a medical dataset to use to detect drug related problems.
if I made a feature selection step (wrraper based on decision tree training ) through weak
then I applied decision tree classification on the subset of features that obtained from feature
selection step, and the accuracy is increased.
my question is : can we say that those features are the most important features to be
considered from the physician of his patients ? as a part of the diagnosis.
Reply
- Jason BrownleeFebruary 1, 2021 at 6:26 am#
  You can say that they might be the most important features for predicting the target variable, that you have some evidence for the hypothesis…
  Reply
MarkFebruary 2, 2021 at 8:50 am#
Hello Jason.
Thanks for build up this great page. I decided to give another shot to switch from dev to ML engineer and I picked your page above any course.
Apparently your link to Weka download page is not redirecting ok. I think is missing the : after https.
Keep up the good work.
Reply
- Jason BrownleeFebruary 2, 2021 at 1:19 pm#
  Thanks, fixed!
  Reply
JC ChouinardFebruary 25, 2021 at 11:41 am#
Weka is an horrible tool.
I tried, I tried, but I will not learn it. It is faster and far more interesting and useful to learn Python instead of learning Weka.
Sorry for the rant Jason. You are doing fantastic work.
But, I thought it would be good to let others know that they are better off learning Python than Weka. I will move to the next chapter on Python Algorithms instead.
Reply
- Jason BrownleeFebruary 26, 2021 at 4:52 am#
  Thank you for sharing your thoughts.
  Reply
ADEMarch 7, 2021 at 4:42 am#
Hi Jason! you rock…
History will not forget about you.
The knowledge you have impacted me will continue to germinate.
Your explanation models are top-notch.
Reply
- Jason BrownleeMarch 7, 2021 at 5:15 am#
  Thank you!
  Reply
Anila KousarMarch 20, 2021 at 5:20 am#
Weka is downloaded and installed. It installs two versions, (i) Weka 3.8.5 (with console) and (ii) Weka 3.8.5.
Reply
- Jason BrownleeMarch 20, 2021 at 5:30 am#
  Perhaps use the one without console?
  Reply
hosseinNovember 30, 2021 at 3:37 pm#
hi my great teacher
I learned much more from your knowledge and your manner
now , that I worked by weka a little and I can analyse some algorithms and some of datasets and compare algorithms, from now , how I must develop my skill and be better and deep in this field , I admire you because of your positive effects on me .
your sincerely , hossein
Reply
- Adrian TamDecember 2, 2021 at 1:59 am#
  Thanks for the appreciation.
  Reply
MarkDecember 16, 2021 at 5:59 am#
First of all thank you for this tutorial!
For the last step where we “Re-evaluate model on current test set.” The results show an error prediction for each instance. It appears to be between 0 and 1. Does 1 indicate a high confidence and 0 a low confidence or vice versa?
Thank you!
Reply
- Adrian TamDecember 17, 2021 at 7:15 am#
  Both can work! The model just produce 0 to 1, the interpretation of what it means is from us.
  Reply
AlausaApril 7, 2022 at 5:27 pm#
Thank you for this great tutorial,
I want to ask if WEKA can accommodate more than one targets (Classes). If yes, explain
Reply
- Adrian TamApril 8, 2022 at 5:28 am#
  Do you think this helps?https://machinelearningmastery.com/multi-class-classification-tutorial-weka/
  Reply
Oluyemi AdeMay 5, 2022 at 1:44 am#
Lesson two completed, waiting for the remaining lessons. Thanks
Reply
- James CarmichaelMay 5, 2022 at 6:26 am#
  Thank you for the feedback Oluyemi! Keep up the great work!
  Reply
AditOctober 17, 2023 at 5:22 pm#
Lesson 5 Weka: Should we drop attributes with higher correlation ranking or how do we decide which attributes to keep or remove as per their ranking?
Reply
- James CarmichaelOctober 18, 2023 at 10:17 am#
  Hi Adit…The following resource may be of interest:
  https://machinelearningmastery.com/perform-feature-selection-machine-learning-data-weka/
  Reply
Princess LejaMay 25, 2024 at 6:32 pm#
Hi Jason
Lesson 13 – In No 6 when I click “Add new button in “Algrothms” pane “Add new” button, when I add the KStar copy, it does not show the KNN parameters. What do I do?
Jason I am bought into Weka because of your comments in one of your sites where you stated “Make the tool a subject of your study”. This has made Weka become my tool of study and I am studying it so keenly. Thanks for all these nuggets of gold you provide all the time!
Reply
Manal RiadJanuary 9, 2025 at 1:07 am#
Hi all,
In Lesson 02, I tried to open the file pointed by your link at no avail. I get the following error message:
“Couldn’t read from URL…”
“No suitable URLSourcedLoader found for URL…”
Reply
- James CarmichaelJanuary 9, 2025 at 9:07 am#
  Hi Manal…This error in Weka is related to the way the software is attempting to load the file from the provided URL. Here are some reasons why this might happen and how to resolve the issue:
  —
  ### **Reasons for the Error**
  1. **Internet Connectivity**:
  – Weka relies on an active internet connection to access files via URLs. If your internet is disconnected or unstable, the file cannot be loaded.
  2. **Invalid or Outdated URL**:
  – The URL provided in the lesson may no longer be valid or accessible. Web pages or file hosting services sometimes move or remove files.
  3. **File Format Issue**:
  – Weka requires datasets in a specific format, such as.arff,.csv, or.xrff. If the file at the URL is not in a compatible format, Weka cannot process it.
  4. **Weka’s Configuration**:
  – Weka usesURLSourcedLoader to fetch files from online locations. If this loader is misconfigured or not available, the process will fail.
  —
  ### **How to Resolve the Issue**
  Here’s what you can do to fix this:
  #### **1. Verify the URL**
  – Double-check the URL provided in the lesson to ensure it is correct.
  – Copy and paste the URL into a browser to see if the file downloads correctly. If the URL is invalid, try to find an updated link.
  #### **2. Download the File Locally**
  – Instead of loading the file directly from the URL, download it to your computer manually:
  1. Visit the URL in your browser.
  2. Download the dataset file (e.g.,.arff or.csv).
  3. Save it in a folder on your computer.
  #### **3. Open the File in Weka**
  – In Weka, go to the **Preprocess tab**.
  – Click on the **Open file…** button.
  – Navigate to the location of the downloaded dataset and select the file.
  #### **4. Use a Different Loader**
  – If you must load the file via URL, ensure you’re using the correct loader:
  – Go to **Preprocess tab > Choose**.
  – Select **URL…** under theFile button.
  – Paste the URL and click **Open**.
  #### **5. Update Weka**
  – Ensure you are using the latest version of Weka. Older versions may not support certain features or URL access methods.
  #### **6. Troubleshoot Weka’s URL Access**
  – If none of the above works, the issue might be related to Weka’s ability to handle URLs. In that case:
  1. Open Weka’s settings or preferences and check for URL-related options.
  2. Consider switching to a local file approach instead of using URLs.
  —
  Reply
Manal RiadJanuary 9, 2025 at 1:27 am#
Hi again,
I am somehow puzzled by the fact that all datsets I opened have ‘class’ attribute.
Reply
- James CarmichaelJanuary 9, 2025 at 9:06 am#
  Hi Manal…
  The presence of a **class attribute** in the datasets you encounter in the Weka Machine Learning Mini-Course is intentional and essential for most machine learning tasks covered in the course. Let me explain why:
  —
  ### Why Do Datasets Have aClass Attribute?
  1. **Supervised Learning Focus**:
  – The Weka course primarily teaches **supervised learning** techniques, where the goal is to train a model to predict a target variable (often called the **class**).
  – The **class attribute** represents the target variable or label that the model is designed to predict.
  2. **Role of the Class Attribute**:
  – For classification tasks, theclass attribute is usually **categorical** (e.g.,Yes/No,Setosa/Versicolor/Virginica).
  – For regression tasks, theclass attribute is typically **numerical** (e.g., house prices or temperatures).
  3. **Consistency Across Examples**:
  – To ensure clarity and uniformity, the datasets in beginner-level tutorials often include aclass attribute as a standard feature. This makes it easier to understand how to train and evaluate models.
  —
  ### What If There’s NoClass Attribute?
  If a dataset doesn’t have aclass attribute, it’s usually used for:
  1. **Unsupervised Learning**:
  – Tasks like clustering or association rule mining don’t require aclass attribute since they focus on discovering patterns in the data without predefined labels.
  – Examples: Customer segmentation or market basket analysis.
  2. **Feature Engineering or Preprocessing**:
  – Sometimes, datasets may be processed to add aclass attribute later for supervised learning tasks.
  —
  ### How to Work with or Add aClass Attribute
  If you’re working with a dataset that lacks aclass attribute and you want to explore supervised learning:
  1. **Define the Class Attribute**:
  – Identify what you want to predict (e.g., adding a binary label likepurchased: yes/no for customer behavior).
  – This could involve adding a new column or transforming existing data.
  2. **Switch to Unsupervised Learning**:
  – Explore clustering or association tasks instead of classification or regression.
  3. **Modify the Dataset in Weka**:
  – You can edit datasets directly in Weka’s **Preprocess tab** by adding or manipulating attributes.
  —
  Reply