Chapters

Tree Based Algorithms

Posted by: Jaspreet

Last Updated on: 17 Nov, 2021


Random Forest


Chapters




1) Building Random Forest

Differences between RandomForest and XGBoost
For a set of n independent observations Z1, . . . ,Zn, each with variance σ2, the variance of the mean Z_bar of the observations is given by σ2/n. In other words, averaging a set of observations reduces variance, thus, Bootstrap-Aggregating
Howevewr, if 1 predictor from the predictor space is too powerful/significant and others moderately significant, then most of the predictions from the bag would be highly correlated to each other (for every bag the highly significant predictor was considered).

In order to avoid this, Random Forest will allow only m-predictors, from the set of p-predictors, to be considered in each split. And the value of m is sqroot of p, i.e if 13 predictors are present, in each split RandomForest will consider just 4 predictors, and 6 predictors if 39 predictors present.

Differences between RandomForest and XGBoost
RandomForest is like cultivating a lot of Instagram followers and making decisions based on their 'votes.' That is, a lot of 'weak learners,' or in the case of Instagram, 'superficial advice' in the form of likes and short comments, when pooled together, might give you a strong answer.

XGBoost is taking the time to cultivate one tree, probably a psychotherapist, who will learn more and more about you and revise past misunderstandings with every session

Other differences include:

  • XGBoost automatically deals with missing values whereas RandomForest from the main libraries in Python and R don't automatically deal with them
  • While each RandomForest tree is built in parallel to another, parallelization happens within a tree for XGBoost, i.e. each branch is built in parallel

2) Building Boosting Based Models