SALE!Use codeBF40 for 40% off everything!
Hurry, sale ends soon!Click to see the full catalog.

Navigation

Making developers awesome at machine learning

Click to Take the FREE Algorithms Crash-Course

How To Create an Algorithm Test Harness From Scratch With Python

By Jason BrownleeonDecember 11, 2019in Code Algorithms From Scratch 8

We cannot know which algorithm will be best for a given problem.

Therefore, we need to design a test harness that we can use to evaluate different machine learning algorithms.

In this tutorial, you will discover how to develop a machine learning algorithm test harness from scratch in Python.

After completing this tutorial, you will know:

How to implement a train-test algorithm test harness.
How to implement a k-fold cross-validation algorithm test harness.

Kick-start your project with my new bookMachine Learning Algorithms From Scratch, includingstep-by-step tutorials and thePython source code files for all examples.

Let’s get started.

Update Jan/2017: Changed the calculation of fold_size in cross_validation_split() to always be an integer. Fixes issues with Python 3.
Update Mar/2018: Added alternate link to download the dataset as the original appears to have been taken down.
Update Aug/2018: Tested and updated to work with Python 3.6.

How To Create an Algorithm Test Harness From Scratch With Python
Photo byChris Meller, some rights reserved.

Description

A test harness provides a consistent way to evaluate machine learning algorithms on a dataset.

It involves 3 elements:

The resampling method to split-up the dataset.
The machine learning algorithm to evaluate.
The performance measure by which to evaluate predictions.

The loading and preparation of a dataset is a prerequisite step that must have been completed prior to using the test harness.

The test harness must allow for different machine learning algorithms to be evaluated, whilst the dataset, resampling method and performance measures are kept constant.

In this tutorial, we are going to demonstrate the test harnesses with a real dataset.

The dataset used is thePima Indians diabetes dataset. It contains 768 rows and 9 columns. All of the values in the file are numeric, specifically floating point values.

The Zero Rule algorithm will be evaluated as part of the tutorial. The Zero Rule algorithm always predicts the class that has the most observations in the training dataset.

Tutorial

This tutorial is broken down into two main sections:

Train-Test Algorithm Test Harness.
Cross-Validation Algorithm Test Harness.

These test harnesses will give you the foundation that you need to evaluate a suite of machine learning algorithms on a given predictive modeling problem.

1. Train-Test Algorithm Test Harness

The train-test split is a simple resampling method that can be used to evaluate a machine learning algorithm.

As such, it is a good starting point for developing a test harness.

We can assume the prior development of a function to split a dataset into train and test sets and a function to evaluate the accuracy of a set of predictions.

We need a function that can take a dataset and an algorithm and return a performance score.

Below is a function namedevaluate_algorithm() that achieves this. It takes 3 fixed arguments including the dataset, the algorithm function and the split percentage for the train-test split.

First, the dataset is split into train and test elements. Next, a copy of the test set is made and each output value is cleared by setting it to theNone value, to prevent the algorithm from cheating accidentally.

The algorithm provided as a parameter is a function that expects the train and test datasets on which to prepare and then make predictions. The algorithm may require additional configuration parameters. This is handled by using the variable arguments*args in theevaluate_algorithm() function and passing them on to the algorithm function.

The algorithm function is expected to return a list of predictions, one for each row in the training dataset. These are compared to the actual output values from the unmodified test dataset by theaccuracy_metric() function.

Finally, the accuracy is returned.

# Evaluate an algorithm using a train/test splitdef evaluate_algorithm(dataset, algorithm, split, *args):train, test = train_test_split(dataset, split)test_set = list()for row in test:row_copy = list(row)row_copy[-1] = Nonetest_set.append(row_copy)predicted = algorithm(train, test_set, *args)actual = [row[-1] for row in test]accuracy = accuracy_metric(actual, predicted)return accuracy

# Evaluate an algorithm using a train/test split

defevaluate_algorithm(dataset,algorithm,split,*args):

train,test=train_test_split(dataset,split)

test_set=list()

forrowintest:

row_copy=list(row)

row_copy[-1]=None

test_set.append(row_copy)

predicted=algorithm(train,test_set,*args)

actual=[row[-1]forrowintest]

accuracy=accuracy_metric(actual,predicted)

returnaccuracy

The evaluation function does make some strong assumptions, but they can easily be changed if needed.

Specifically, it assumes that the last row in the dataset is always the output value. A different column could be used. The use of theaccuracy_metric() assumes that the problem is a classification problem, but this could be changed to mean squared error for regression problems.

Let’s piece this together with a worked example.

We will use the Pima Indians diabetes dataset and evaluate the Zero Rule algorithm.

# Train-Test Test Harnessfrom random import seedfrom random import randrangefrom csv import reader# Load a CSV filedef load_csv(filename):file = open(filename, "rb")lines = reader(file)dataset = list(lines)return dataset# Convert string column to floatdef str_column_to_float(dataset, column):for row in dataset:row[column] = float(row[column].strip())# Split a dataset into a train and test setdef train_test_split(dataset, split):train = list()train_size = split * len(dataset)dataset_copy = list(dataset)while len(train) < train_size:index = randrange(len(dataset_copy))train.append(dataset_copy.pop(index))return train, dataset_copy# Calculate accuracy percentagedef accuracy_metric(actual, predicted):correct = 0for i in range(len(actual)):if actual[i] == predicted[i]:correct += 1return correct / float(len(actual)) * 100.0# Evaluate an algorithm using a train/test splitdef evaluate_algorithm(dataset, algorithm, split, *args):train, test = train_test_split(dataset, split)test_set = list()for row in test:row_copy = list(row)row_copy[-1] = Nonetest_set.append(row_copy)predicted = algorithm(train, test_set, *args)actual = [row[-1] for row in test]accuracy = accuracy_metric(actual, predicted)return accuracy# zero rule algorithm for classificationdef zero_rule_algorithm_classification(train, test):output_values = [row[-1] for row in train]prediction = max(set(output_values), key=output_values.count)predicted = [prediction for i in range(len(test))]return predicted# Test the zero rule algorithm on the diabetes datasetseed(1)# load and prepare datafilename = 'pima-indians-diabetes.csv'dataset = load_csv(filename)for i in range(len(dataset[0])):str_column_to_float(dataset, i)# evaluate algorithmsplit = 0.6accuracy = evaluate_algorithm(dataset, zero_rule_algorithm_classification, split)print('Accuracy: %.3f%%' % (accuracy))

# Train-Test Test Harness

fromrandomimportseed

fromrandomimportrandrange

fromcsvimportreader

# Load a CSV file

defload_csv(filename):

file=open(filename,"rb")

lines=reader(file)

dataset=list(lines)

returndataset

# Convert string column to float

defstr_column_to_float(dataset,column):

forrowindataset:

row[column]=float(row[column].strip())

# Split a dataset into a train and test set

deftrain_test_split(dataset,split):

train=list()

train_size=split *len(dataset)

dataset_copy=list(dataset)

whilelen(train)<train_size:

index=randrange(len(dataset_copy))

train.append(dataset_copy.pop(index))

returntrain,dataset_copy

# Calculate accuracy percentage

defaccuracy_metric(actual,predicted):

correct=0

foriinrange(len(actual)):

ifactual[i]==predicted[i]:

correct+=1

returncorrect/float(len(actual))*100.0

# Evaluate an algorithm using a train/test split

defevaluate_algorithm(dataset,algorithm,split,*args):

train,test=train_test_split(dataset,split)

test_set=list()

forrowintest:

row_copy=list(row)

row_copy[-1]=None

test_set.append(row_copy)

predicted=algorithm(train,test_set,*args)

actual=[row[-1]forrowintest]

accuracy=accuracy_metric(actual,predicted)

returnaccuracy

# zero rule algorithm for classification

defzero_rule_algorithm_classification(train,test):

output_values=[row[-1]forrowintrain]

prediction=max(set(output_values),key=output_values.count)

predicted=[predictionforiinrange(len(test))]

returnpredicted

# Test the zero rule algorithm on the diabetes dataset

seed(1)

# load and prepare data

filename='pima-indians-diabetes.csv'

dataset=load_csv(filename)

foriinrange(len(dataset[0])):

str_column_to_float(dataset,i)

# evaluate algorithm

split=0.6

accuracy=evaluate_algorithm(dataset,zero_rule_algorithm_classification,split)

print('Accuracy: %.3f%%'%(accuracy))

The dataset was split into 60% for training the model and 40% for evaluating it.

Notice how the name of the Zero Rule algorithmzero_rule_algorithm_classification was passed as an argument to theevaluate_algorithm() function. You can see how this test harness may be used again and again with different algorithms.

Running the example above prints out the accuracy of the model.

Accuracy: 67.427%

1	Accuracy: 67.427%

2. Cross-Validation Algorithm Test Harness

Cross-validation is a resampling technique that provides more reliable estimates of algorithm performance on unseen data.

It requires the creation and evaluation of k models on different subsets of your data, and such is more computationally expensive. Nevertheless, it is the gold standard for evaluating machine learning algorithms.

As in the previous section, we need to create a function that ties together the resampling method, the evaluation of the algorithm on the dataset and the performance calculation method.

Unlike above, the algorithm must be evaluated on different subsets of the dataset many times. This means we need additional loops within ourevaluate_algorithm() function.

Below is a function that implements algorithm evaluation with cross-validation.

First, the dataset is split inton_folds groups called folds.

Next, we loop giving each fold an opportunity to be held out of training and used to evaluate the algorithm. A copy of the list of folds is created and the held out fold is removed from this list. Then the list of folds is flattened into one long list of rows to match the algorithms expectation of a training dataset. This is done using thesum() function.

Once the training dataset is prepared the rest of the function within this loop is as above. A copy of the test dataset (the fold) is made and the output values are cleared to avoid accidental cheating by algorithms. The algorithm is prepared on the train dataset and makes predictions on the test dataset. The predictions are evaluated and stored in a list.

Unlike the train-test algorithm test harness, a list of scores is returned, one for each cross-validation fold.

# Evaluate an algorithm using a cross validation splitdef evaluate_algorithm(dataset, algorithm, n_folds, *args):folds = cross_validation_split(dataset, n_folds)scores = list()for fold in folds:train_set = list(folds)train_set.remove(fold)train_set = sum(train_set, [])test_set = list()for row in fold:row_copy = list(row)test_set.append(row_copy)row_copy[-1] = Nonepredicted = algorithm(train_set, test_set, *args)actual = [row[-1] for row in fold]accuracy = accuracy_metric(actual, predicted)scores.append(accuracy)return scores

# Evaluate an algorithm using a cross validation split

defevaluate_algorithm(dataset,algorithm,n_folds,*args):

folds=cross_validation_split(dataset,n_folds)

scores=list()

forfoldinfolds:

train_set=list(folds)

train_set.remove(fold)

train_set=sum(train_set,[])

test_set=list()

forrowinfold:

row_copy=list(row)

test_set.append(row_copy)

row_copy[-1]=None

predicted=algorithm(train_set,test_set,*args)

actual=[row[-1]forrowinfold]

accuracy=accuracy_metric(actual,predicted)

scores.append(accuracy)

returnscores

Although slightly more complex in code and slower to run, this function provides a more robust estimate of algorithm performance.

We can tie all of this together with a complete example on the diabetes dataset with the Zero Rule algorithm.

# Cross Validation Test Harnessfrom random import seedfrom random import randrangefrom csv import reader# Load a CSV filedef load_csv(filename):file = open(filename, "rb")lines = reader(file)dataset = list(lines)return dataset# Convert string column to floatdef str_column_to_float(dataset, column):for row in dataset:row[column] = float(row[column].strip())# Split a dataset into k foldsdef cross_validation_split(dataset, n_folds):dataset_split = list()dataset_copy = list(dataset)fold_size = int(len(dataset) / n_folds)for i in range(n_folds):fold = list()while len(fold) < fold_size:index = randrange(len(dataset_copy))fold.append(dataset_copy.pop(index))dataset_split.append(fold)return dataset_split# Calculate accuracy percentagedef accuracy_metric(actual, predicted):correct = 0for i in range(len(actual)):if actual[i] == predicted[i]:correct += 1return correct / float(len(actual)) * 100.0# Evaluate an algorithm using a cross validation splitdef evaluate_algorithm(dataset, algorithm, n_folds, *args):folds = cross_validation_split(dataset, n_folds)scores = list()for fold in folds:train_set = list(folds)train_set.remove(fold)train_set = sum(train_set, [])test_set = list()for row in fold:row_copy = list(row)test_set.append(row_copy)row_copy[-1] = Nonepredicted = algorithm(train_set, test_set, *args)actual = [row[-1] for row in fold]accuracy = accuracy_metric(actual, predicted)scores.append(accuracy)return scores# zero rule algorithm for classificationdef zero_rule_algorithm_classification(train, test):output_values = [row[-1] for row in train]prediction = max(set(output_values), key=output_values.count)predicted = [prediction for i in range(len(test))]return predicted# Test the zero rule algorithm on the diabetes datasetseed(1)# load and prepare datafilename = 'pima-indians-diabetes.csv'dataset = load_csv(filename)for i in range(len(dataset[0])):str_column_to_float(dataset, i)# evaluate algorithmn_folds = 5scores = evaluate_algorithm(dataset, zero_rule_algorithm_classification, n_folds)print('Scores: %s' % scores)print('Mean Accuracy: %.3f%%' % (sum(scores)/len(scores)))

# Cross Validation Test Harness

fromrandomimportseed

fromrandomimportrandrange

fromcsvimportreader

# Load a CSV file

defload_csv(filename):

file=open(filename,"rb")

lines=reader(file)

dataset=list(lines)

returndataset

# Convert string column to float

defstr_column_to_float(dataset,column):

forrowindataset:

row[column]=float(row[column].strip())

# Split a dataset into k folds

defcross_validation_split(dataset,n_folds):

dataset_split=list()

dataset_copy=list(dataset)

fold_size=int(len(dataset)/n_folds)

foriinrange(n_folds):

fold=list()

whilelen(fold)<fold_size:

index=randrange(len(dataset_copy))

fold.append(dataset_copy.pop(index))

dataset_split.append(fold)

returndataset_split

# Calculate accuracy percentage

defaccuracy_metric(actual,predicted):

correct=0

foriinrange(len(actual)):

ifactual[i]==predicted[i]:

correct+=1

returncorrect/float(len(actual))*100.0

# Evaluate an algorithm using a cross validation split

defevaluate_algorithm(dataset,algorithm,n_folds,*args):

folds=cross_validation_split(dataset,n_folds)

scores=list()

forfoldinfolds:

train_set=list(folds)

train_set.remove(fold)

train_set=sum(train_set,[])

test_set=list()

forrowinfold:

row_copy=list(row)

test_set.append(row_copy)

row_copy[-1]=None

predicted=algorithm(train_set,test_set,*args)

actual=[row[-1]forrowinfold]

accuracy=accuracy_metric(actual,predicted)

scores.append(accuracy)

returnscores

# zero rule algorithm for classification

defzero_rule_algorithm_classification(train,test):

output_values=[row[-1]forrowintrain]

prediction=max(set(output_values),key=output_values.count)

predicted=[predictionforiinrange(len(test))]

returnpredicted

# Test the zero rule algorithm on the diabetes dataset

seed(1)

# load and prepare data

filename='pima-indians-diabetes.csv'

dataset=load_csv(filename)

foriinrange(len(dataset[0])):

str_column_to_float(dataset,i)

# evaluate algorithm

n_folds=5

scores=evaluate_algorithm(dataset,zero_rule_algorithm_classification,n_folds)

print('Scores: %s'%scores)

print('Mean Accuracy: %.3f%%'%(sum(scores)/len(scores)))

A total of 5 cross validation folds were used to evaluate the Zero Rule Algorithm. As such, 5 scores were returned from theevaluate_algorithm() algorithm.

Running this example both prints these list of scores calculated and prints the mean score.

Scores: [62.091503267973856, 64.70588235294117, 64.70588235294117, 64.70588235294117, 69.28104575163398]Mean Accuracy: 65.098%

1 2	Scores: [62.091503267973856, 64.70588235294117, 64.70588235294117, 64.70588235294117, 69.28104575163398] Mean Accuracy: 65.098%

You now have two different test harnesses that you can use to evaluate your own machine learning algorithms.

Extensions

This section lists extensions to this tutorial that you may wish to consider.

Parameterized Evaluation. Pass in the function used to evaluate predictions, allowing you to seamlessly work with regression problems.
Parameterized Resampling. Pass in the function used to calculate resampling splits, allowing you to easily switch between the train-test and cross-validation methods.
Standard Deviation Scores. Calculate the standard deviation to get an idea of the spread of scores when evaluating algorithms using cross-validation.

Did you try any of these extensions?
Share your experiences in the comments below.

Review

In this tutorial, you discovered how to create a test harness from scratch to evaluate your machine learning algorithms.

Specifically, you now know:

How to implement and use a train-test algorithm test harness.
How to implement and use a cross-validation algorithm test harness.

Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.

Discover How to Code Algorithms From Scratch!

Machine Learning Algorithms From Scratch

No Libraries, Just Python Code.

...with step-by-step tutorials on real-world datasets

Discover how in my new Ebook:
Machine Learning Algorithms From Scratch

It covers18 tutorials with all the code for12 top algorithms, like:
Linear Regression, k-Nearest Neighbors, Stochastic Gradient Descent and much more...

Finally, Pull Back the Curtain on
Machine Learning Algorithms

Skip the Academics. Just Results.

See What's Inside

8 Responses toHow To Create an Algorithm Test Harness From Scratch With Python

Timothy OriedoOctober 24, 2016 at 5:39 am#
Thank you do for this will definitely get my start in machine learning.
Reply
- Jason BrownleeOctober 24, 2016 at 7:06 am#
  I’m glad to hear it Timothy.
  Reply
ThineswaranOctober 24, 2016 at 4:18 pm#
What’s the difference between this & using the built-in cross_val_score in Python sklearn?
Reply
- Jason BrownleeOctober 25, 2016 at 8:22 am#
  Great question.
  Use sklearn in practice.
  If you want to learn how all these methods work from first principles, try implementing them yourself.
  Reply
AnandOctober 30, 2016 at 6:13 am#
How about using the weka tool to harness the algorithm? As you said above, is it the same to that of using built-in function in sklearn? which stops at practicing and not to learn principles.
Reply
- Jason BrownleeOctober 30, 2016 at 8:58 am#
  Can you elaborate on what you mean Anand?
  Reply
jms.plmr@gmail.comFebruary 13, 2018 at 10:58 pm#
In my implementatin get the following
line 163, in k_cross_validate
train_set.remove(fold)
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Reply
- Jason BrownleeFebruary 14, 2018 at 8:20 am#
  Sorry, I’ve not seen this error before. Are you using Python 2?
  Reply

Movatterモバイル変換

Navigation

How To Create an Algorithm Test Harness From Scratch With Python

Description

Tutorial

1. Train-Test Algorithm Test Harness

2. Cross-Validation Algorithm Test Harness

Extensions

Review

Discover How to Code Algorithms From Scratch!

No Libraries, Just Python Code.

Finally, Pull Back the Curtain on
Machine Learning Algorithms

More On This Topic

About Jason Brownlee

8 Responses toHow To Create an Algorithm Test Harness From Scratch With Python

Leave a ReplyClick here to cancel reply.

Never miss a tutorial:

Picked for you:

Loving the Tutorials?

Movatterモバイル変換

Navigation

Description

Tutorial

1. Train-Test Algorithm Test Harness

2. Cross-Validation Algorithm Test Harness

Extensions

Review

Discover How to Code Algorithms From Scratch!

No Libraries, Just Python Code.

Finally, Pull Back the Curtain onMachine Learning Algorithms

More On This Topic

About Jason Brownlee

8 Responses toHow To Create an Algorithm Test Harness From Scratch With Python

Leave a ReplyClick here to cancel reply.

Never miss a tutorial:

Picked for you:

Loving the Tutorials?

Finally, Pull Back the Curtain on
Machine Learning Algorithms