Movatterモバイル変換

Playing around the Net | A short introductory example to Neural Networks in R

8/7/2018

So-called Artificial Neural Networks (ANN) are a family of popular Machine Learning algorithms that has contributed to advances in data science, e.g. in processing speech, vision and text. In essence, a Neural Network can be seen as acomputational system that provides predictions based on existing data. Neural Networks are comparable to non-linear regression models (such as logit regression), theirpotential strength lies in the ability to process a large number of model parameters.

Neural Networks aregood at learning non-linear functions. Moreovermultiple outputs can be modelled.

Artifical Neural Networks are generically inspired by the biologicalneural networks within animal and human brains. They consist of the following key components:

Input layer: The descriptive variables that are fed into the prediction
Hidden layer:A user-defined number of layers with a specified number of so-called neurons in each layer.
Output layer: This is the variable(s) we are trying to predict. The output could be the label of an image or 0 or 1 / TRUE or FALSE output.
Weights: Each neuron in any given layer is potentially linked to every neuron in the neighbouring layer. The weights reflect the importance of this layer.

In a simplified manner, an Neural Network can be visualized as follows:

For the simplified application example below, we produced anexample dataset with some 140.000 records. Imagine that we start with a relatively large dataset of sporadic donors and have come up with astraightforward definition of the dependent churn variable, e.g. a definition based on the recency of the last respective donation.

The features (variables) we included were:

AgeAtEntry: Age of respective supporter at the time of database entry (typically first donation)[Integer]
Title: Is there any title likePhD,Dr.,Professor etc.[Binary]
IncomeEst: Externally enriched income estimation on person level[Integer]
FirstDon: Initial donation of supporter[Integer]
Churn: This is our binary (0/1) dependent variable.

We start with loading the relevant R packages, reading in our base dataset and some data pre-processing.

Code Snippet #1: Loading packages and data

An essential step in setting up Neural Networks isdata normalization. This implies thescalingof the data. See for instancethis linkfor some brief conceptual considerations and information on the scale function in R.

Code Snippet #2: Scaling

We thensplit the dataset into a training and test datasetusing a 70% split.

Code Snippet #3: Training and test set

Now we are ready tofit the model. We use the packagennetwith one hidden layer containing 4 neurons. We run a maximum of 5.000 iterations using the code shown in code snippet number 4:

Code Snippet #4: Fitting Neural Net Model

After fitting the model, weplot our neural net object. The neuronB1 in the illustration below is a so calledbias unit. This is an additional neuron added to each pre-output layer (in our case one). Bias units are not connected to any previous layer and therefore do not represent an "activity". Bias units can still have outgoing connections and might contribute to the outputs in doing so. There is a compactpost on Quora with a more detailed discussion.

When it comes to modelling in a data science context, it is quite common to look at thevariable importance within the respective model. For neural nets, there is a comfortable way to do this using the functionolden from the packageNeuralNetTools.For our readers interested in the conceptual foundations of this functions, we can recommendthis paper.

Code Snippet #5: Function olden for variable importance

This is the chart that we get:

It stand out that the variableAge at entry has a high negative importance on the output whereasEstimated Incomeshows some degree of positive variable importance.

We finally turn torunning the neural net model for predictive purposes on our test data set and plot our results in aconfusion matrix-like manner:

Code Snippet #6: Run prediction and show results

The result of the code above looks as follows:

The table above cross-tabls the actual and predicted outcomes of churned and non-churned donors.Let's now evaluate the predictive power of our example neural net. In doing so, we can recommend this nice guide to interpreting confusion matrices which can be foundhere.

Overallaccuracy [How often is the classifier correct?, i.e. (True Positives + True Negatives) / Total i.e. (330 + 31.564) / 43.967 isrelatively high with 0.725.
Specificity(True Negatives divided by all Negatives, i.e. the correctly predicted No-Churns) is at 0.73 [31.564 / (31.564 + 11.717)].
However,SensitivityTrue Positive Rate) is reallylowas True Positive (correctly predicted churn) divided by total of actual churned is 330 / 12.047 = 0.0274.
Thesame goes forPrecision[True Positives over total predicted Positives, i.e. 330 / (330 + 356)] which is at0.48.

In the light of our data and the example model described above, we canconclude that definitely further model tuning would be needed. Tuning will focus on the used Hyperparameters. At the same time, we would recommend running a "benchmark model" such as a logit regression to compare the neural net's model performance with.

Asfurther reading we can recommend:

This introduction into Neural Nets from Towards Data Science.
This tutorial by David Selby that also elaborates on the underlying math.
Apost on RPubs by Capt Spencer Butt which includes an extensive list of references.

1 Comment

This website uses marketing and tracking technologies. Opting out of this will opt you out of all cookies, except for those needed to run the website. Note that some products may not work as well without tracking cookies.

Opt Out of Cookies

Movatterモバイル変換

Playing around the Net | A short introductory example to Neural Networks in R

Categories

Archive