Random forests
Arandom forest (RF) is an ensemble of decision trees in which eachdecision tree is trained with a specific random noise. Random forests are themost popular form of decision tree ensemble. This unitdiscusses several techniques for creating independent decision trees to improvethe odds of building an effective random forest.
Bagging
Bagging (bootstrapaggregating) means training each decisiontree on a random subset of the examples in the training set. In other words,each decision tree in the random forest is trained on adifferent subset ofexamples.
Bagging is peculiar. Each decision tree is trained on the samenumber ofexamples as in the original training set. For example, if the original trainingset contains 60 examples, then each decision tree is trained on 60 examples.However, bagging only trains each decision tree on asubset (typically, 67%)of those examples. So, some of those 40 examples in the subset must bereusedwhile training a given decision tree. This reuse is called training "withreplacement."
For example, Table 6 shows how bagging could distribute six examples acrossthree decision trees. Notice the following:
- Each decision tree trains on a total of six examples.
- Each decision tree trains on a different set of examples.
- Each decision tree reuses certain examples. For example, example #4 is usedtwice in training decision tree 1; therefore, the learned weight ofexample #4 is effectively doubled in decision tree 1.
Table 6. Bagging six training examples across three decision trees. Eachnumber represents the number of times a given training example (#1-6) isrepeated in the training dataset of a given decision tree (1-3).
| training examples | ||||||
|---|---|---|---|---|---|---|
| #1 | #2 | #3 | #4 | #5 | #6 | |
| original dataset | 1 | 1 | 1 | 1 | 1 | 1 |
| decision tree 1 | 1 | 1 | 0 | 2 | 1 | 1 |
| decision tree 2 | 3 | 0 | 1 | 0 | 2 | 0 |
| decision tree 3 | 0 | 1 | 3 | 1 | 0 | 1 |
In bagging, each decision tree is almost always trained on the total number ofexamples in the original training set. Training each decision tree on moreexamples or fewer examples tends to degrade the quality of the random forest.
While not present in theoriginal random forestpaper, thesampling of examples is sometimes done "without replacement"; that is, atraining example cannot be present more than once in a decision tree trainingset. For example, in the preceding table, all values would all be either 0 or 1.
bootstrap_training_dataset=FalseAttribute sampling
Attribute sampling means that instead of looking for the best condition overall available features, only a random subset of features are tested at eachnode. The set of tested features is sampled randomly at each node ofthe decision tree.
The following decision tree illustrates the attribute / feature sampling.Here a decision tree istrained on 5 features (f1-f5). The blue nodes represent the tested featureswhile the white ones are not tested. The condition is built from the best testedfeatures (represented with a red outline).

Figure 21. Attribute sampling.
The ratio of attribute sampling is an important regularization hyperparameter.The preceding Figure used a ~⅗ ratio. Many random forest implementations test,by default, 1/3 of the features for regression and sqrt(number of features) forclassification.
In TF-DF, the following hyperparameters control attribute sampling:
num_candidate_attributesnum_candidate_attributes_ratio
For example, ifnum_candidate_attributes_ratio=0.5, half of the features willbe tested at each node.
Disabling decision tree regularization
Individual decision trees in a random forest are trained without pruning. (SeeOverfitting and pruning). This produces overlycomplex trees with poor predictive quality. Instead of regularizing individualtrees, the trees are ensembled producing more accurate overall predictions.
Note: Often, reducing aggressively the variance of individual decision trees(for example, by limiting their depth) improves the predictive quality ofindividual decision trees but degrades the predictive accuracy of the randomforest.Weexpect the training and test accuracy of a random forest to differ.The training accuracy of a random forest isgenerally much higher (sometimes equal to 100%). However, a very hightraining accuracy in a random forest isnormal and does notindicate that the random forest is overfitted.
The two sources of randomness (bagging and attribute sampling)ensure the relative independence between the decision trees. This independencecorrects the overfitting of the individual decision trees. Consequently,the ensemble is not overfitted.We'll illustrate this non-intuitive effect in the next unit.
Pure random forests train without maximum depth or minimum number ofobservations per leaf. In practice, limiting the maximum depth andminimum number of observations per leaf is beneficial. By default, manyrandom forests use the following defaults:
- maximum depth of ~16
- minimum number of observations per leaf of ~5.
You can tune these hyperparameters.
The clarity of noise
Why would random noise improve the quality of a random forest? To illustrate thebenefits of random noise, Figure 22 shows the predictions of a classical(pruned) decision tree and a random forest trained on a few examples of simpletwo-dimensional problem with an ellipse pattern.
Ellipses patterns are notoriously hard for decision tree and decision forestalgorithms to learn with axis-aligned conditions, so they make a goodexample. Notice that the pruned decision tree can't get the same quality ofprediction as the random forest.

Figure 22. Ground truth vs. predictions generated by a single pruned decisiontree and predictions generated by a random forest.
The next plot shows the predictions of the first three unpruned decision treesof the random forest; that is, the decision trees are all trained with acombination of:
- bagging
- attribute sampling
- disabling pruning
Notice that the individual predictions of these three decision trees areworsethan the predictions of the pruned decision tree in the preceding figure.However, since the errors of the individual decision trees are only weaklycorrelated, the three decision trees combine in an ensemble to create effectivepredictions.

Figure 23. Three unpruned decision trees that will build an effective ensemble.
Because the decision trees of a random forest are not pruned, training a randomforest does not require a validation dataset. In practice, and especially onsmall datasets, models should be trained on all the available data.
When training a random forest, as more decision trees are added, the erroralmost always decreases; that is, the quality of the model almost alwaysimproves. Yes, adding more decision trees almost alwaysreduces the error of the random forest. In other words, adding more decisiontrees cannot cause the random forest to overfit. At some point, the model juststops improving. Leo Breiman famously said,"Random Forests do not overfit, as more trees areadded".
For example, the following plot shows the test evaluation of a random forestmodel as more decision trees areadded. The accuracy rapidly improves until it plateaus around 0.865. However,adding more decision trees does not make accuracy decrease; in other words,themodel does not overfit. This behavior is (mostly) always true and independent ofthe hyperparameters.

Figure 24. Accuracy stays constant as more decision trees are added to therandom forest.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-25 UTC.