Random Forest is a powerful ensemble learning method in machine learning. Unlike using a single decision tree, Random Forest combines multiple decision trees to create a more robust and accurate predictor. This approach follows the wisdom of crowds principle, where many weak learners working together can form a strong predictor that performs better than any individual tree.
Bootstrap sampling is a key component of Random Forest. It involves sampling with replacement from the original dataset to create multiple different training sets. Each tree in the forest is trained on a different bootstrap sample. Some data points may appear multiple times in a sample, while others may not appear at all. This randomness creates diversity among the trees, which helps reduce overfitting and improves the overall performance of the ensemble.
Feature randomness is the second key component of Random Forest. At each node split during tree construction, the algorithm considers only a random subset of features instead of all available features. For classification problems, typically the square root of the total number of features is used, while for regression, about one-third of the features are considered. This randomness reduces correlation between trees and prevents the forest from overfitting to dominant features, leading to better generalization.
The tree construction process combines both bootstrap sampling and feature randomness. For each tree, the algorithm first creates a bootstrap sample from the original dataset. Then, at each node during tree construction, it selects a random subset of features and finds the best split among only those selected features. This process repeats until stopping criteria are met. Multiple trees are built in parallel, each using different bootstrap samples and random feature subsets. This dual randomness ensures that each tree in the forest becomes unique, creating the diversity needed for effective ensemble learning.
Random Forest makes final predictions by aggregating the outputs of all individual trees. For classification problems, it uses majority voting where each tree votes for a class, and the class with the most votes becomes the final prediction. For regression problems, it takes the average of all tree predictions. In this example, we have three trees making predictions for a new data point. Two trees predict Class A and one predicts Class B, so the final prediction is Class A through majority vote. This aggregation process helps reduce individual tree errors and improves overall accuracy.