AI News, How to Run Your First Classifier in Weka

How to Run Your First Classifier in Weka

Weka makes learning applied machine learning easy, efficient, and fun.

It is a GUI tool that allows you to load datasets, run algorithms and design and run experiments with results statistically robust enough to publish.

recommend Weka to beginners in machine learning because it lets them focus on learning the process of applied machine learning rather than getting bogged down by the mathematics and the programming —

It also provides other features, like data filtering, clustering, association rule extraction, and visualization, but we won’t be using these features right now.

It contains 150 instances (rows) and 4 attributes (columns) and a class attribute for the species of iris flower (one of setosa, versicolor, and virginica).

The ZeroR algorithm selects the majority class in the dataset (all three species of iris are equally present in the data, so it picks the first one: setosa) and uses that to make all predictions.

The result is 33%, as expected (3 classes, each equally represented, assigning one of the three to each prediction results in 33% classification accuracy).

This means that the dataset is split into 10 parts: the first 9 are used to train the algorithm, and the 10th is used to assess the algorithm.

The algorithm was run with 10-fold cross-validation: this means it was given an opportunity to make a prediction for each instance of the dataset (with different training folds) and the presented result is a summary of those predictions.

You can see a table of actual classes compared to predicted classes and you can see that there was 1 error where an Iris-setosa was classified as an Iris-versicolor, 2 cases where Iris-virginica was classified as an Iris-versicolor, and 3 cases where an Iris-versicolor was classified as an Iris-setosa (a total of 6 errors).

How to Run Your First Classifier in Weka

Weka makes learning applied machine learning easy, efficient, and fun.

It is a GUI tool that allows you to load datasets, run algorithms and design and run experiments with results statistically robust enough to publish.

recommend Weka to beginners in machine learning because it lets them focus on learning the process of applied machine learning rather than getting bogged down by the mathematics and the programming —

It also provides other features, like data filtering, clustering, association rule extraction, and visualization, but we won’t be using these features right now.

It contains 150 instances (rows) and 4 attributes (columns) and a class attribute for the species of iris flower (one of setosa, versicolor, and virginica).

The ZeroR algorithm selects the majority class in the dataset (all three species of iris are equally present in the data, so it picks the first one: setosa) and uses that to make all predictions.

The result is 33%, as expected (3 classes, each equally represented, assigning one of the three to each prediction results in 33% classification accuracy).

This means that the dataset is split into 10 parts: the first 9 are used to train the algorithm, and the 10th is used to assess the algorithm.

The algorithm was run with 10-fold cross-validation: this means it was given an opportunity to make a prediction for each instance of the dataset (with different training folds) and the presented result is a summary of those predictions.

You can see a table of actual classes compared to predicted classes and you can see that there was 1 error where an Iris-setosa was classified as an Iris-versicolor, 2 cases where Iris-virginica was classified as an Iris-versicolor, and 3 cases where an Iris-versicolor was classified as an Iris-setosa (a total of 6 errors).

Design and Run your First Experiment in Weka

Weka is the perfect platform for learning machine learning.

It provides a graphical user interface for exploring and experimenting with machine learning algorithms on datasets, without you having to worry about the mathematics or the programming.

If you follow along the step-by-step instructions, you will design an run your first machine learning experiment in under five minutes.

If you are interested in machine learning, then I know you can figure out how to download and install software into your own computer.

The Weka Experimenter allows you to design your own experiments of running algorithms on datasets, run the experiments and analyze the results.

type problem and each algorithm + dataset combination is run 10 times (iteration control).

The Iris flower dataset is a famous dataset from statistics and is heavily borrowed by researchers in machine learning.

It contains 150 instances (rows) and 4 attributes (columns) and a class attribute for the species of iris flower (one of setosa, versicolor, virginica).

Given that all three class values have an equal share (50 instances), it picks the first class value “setosa”

It picks one attribute that best correlates with the class value and splits it up to get the best prediction accuracy it can.

Like the ZeroR algorithm, the algorithm is so simple that you could implement it by hand and we would expect that more sophisticated algorithms out perform it.

The ranking table shows the number of statistically significant wins each algorithm has had against all other algorithms on the dataset.

Each algorithm was run 10 times on the dataset and the accuracy reported is the mean and the standard deviation in rackets of those 10 runs.

We can also see that the accuracy for these algorithms compared to ZeroR is high, so we can say that these two algorithms achieved a statistically significantly better result than the ZeroR baseline.

If we wanted to report the results, we would say that the OneR algorithm achieved a classification accuracy of 92.53% (+/- 5.47%) which is statistically significantly better than ZeroR at 33.33% (+/- 5.47%).

You now have the skill to design and run experiments with any algorithms provided by Weka on datasets of your choosing and meaningfully and confidently report results that you achieve.

How To Use Regression Machine Learning Algorithms in Weka

Weka has a large number of regression algorithms available on the platform.

Each algorithm that we cover will be briefly described in terms of how it works, key algorithm parameters will be highlighted and the algorithm will be demonstrated in the Weka Explorer interface.

It is a very simple regression algorithm, fast to train and can have great performance if the output variable for your data is a linear combination of your inputs.

Choose the linear regression algorithm: The performance of linear regression can be reduced if your training data has input attributes that are highly correlated.

It does this by minimizing the square of the absolute sum of the learned coefficients, which will prevent any specific coefficient from becoming too large (a sign of complexity in regression models).

It works by storing the entire training dataset and querying it to locate the k most similar training patterns when making a prediction.

When making predictions on regression problems, KNN will take the mean of the k most similar instances in the training dataset. Choose the KNN algorithm: In Weka KNN is called IBk which stands for Instance Based k.

For example, if set to 1, then predictions are made using the single most similar training instance to a given new pattern for which a prediction is requested.

They work by creating a tree to evaluate an instance of data, start at the root of the tree and moving town to the leaves (roots because the tree is drawn with an inverted prospective) until a prediction can be made.

The process of creating a decision tree works by greedily selecting the best split point in order to make predictions and repeating the process until the tree is a fixed depth.

The minNum parameter defines the minimum number of instances supported by the tree in a leaf node when constructing the tree from the training data.

Support Vector Machines were developed for binary classification problems, although extensions to the technique have been made to support multi-class classification and regression problems.

Unlike SVM that finds a line that best separates the training data into classes, SVR works by finding a line of best fit that minimizes the error of a cost function.

In almost all problems of interest, a line cannot be drawn to best fit the data, therefore a margin is added around the line to relax the constraint, allowing some bad predictions to be tolerated but allowing a better result overall.

It is an algorithm inspired by a model of biological neural networks in the brain where small processing units called neurons are organized into layers that if configured well are capable of approximating any function.

This can be fun, but it is recommended that you use the GUI with a simple train and test split of your training data, otherwise you will be asked to design a network for each of the 10 folds of cross validation.

The learning process can be further tuned with a momentum (set to 0.2 by default) to continue updating the weights even when no changes need to be made, and a decay (set decay to True) which will reduce the learning rate over time to perform more learning at the beginning of training and less at the end.

Weka Data Mining Tutorial for First Time & Beginner Users

23-minute beginner-friendly introduction to data mining with WEKA. Examples of algorithms to get you started with WEKA: logistic regression, decision tree, ...

Weka Tutorial 35: Creating Training, Validation and Test Sets (Data Preprocessing)

The tutorial that demonstrates how to create training, test and cross validation sets from a given dataset.

Getting Started with Weka - Machine Learning Recipes #10

Hey everyone! In this video, I'll walk you through using Weka - The very first machine learning library I've ever tried. What's great is that Weka comes with a GUI ...

Weka Tutorial 19: Outliers and Extreme Values (Data Preprocessing)

This tutorial shows how to detect and remove outliers and extreme values from datasets using WEKA.

More Data Mining with Weka (1.6: Working with big data)

More Data Mining with Weka: online course from the University of Waikato Class 1 - Lesson 6: Working with big data Slides (PDF): ..

Weka Tutorial 02: Data Preprocessing 101 (Data Preprocessing)

This tutorial demonstrates various preprocessing options in Weka. However, details about data preprocessing will be covered in the upcoming tutorials.

More Data Mining with Weka (5.2: Multilayer Perceptrons)

More Data Mining with Weka: online course from the University of Waikato Class 5 - Lesson 2: Multilayer Perceptrons Slides (PDF): ..

More Data Mining with Weka (1.5: The Command Line interface)

More Data Mining with Weka: online course from the University of Waikato Class 1 - Lesson 5: The Command Line interface Slides ..

convert data and names file to arff

How to Make an Image Classifier - Intro to Deep Learning #6

We're going to make our own Image Classifier for cats & dogs in 40 lines of Python! First we'll go over the history of image classification, then we'll dive into the ...