Random Forest

#machine-learning #interview

Steps

Create S different dataset from the raw dataset, len(s) = B
1. They will be slightly different as they are taken random with replacement
2. On each dataset take random subset of features rather than all features
Learn S different decision trees
Combine them for prediction
1. For regression, take the average
2. For classification, take the majority vote

Advantages of Random Forest

As random sample is done, we reduce the effect of noise, outliers, and imbalanced dataset
Less hyperparameters to tune
No feature scaling is needed as it used Decision Tree internally

Disadvantages of Random Forest

Hard to interpret

Classification: As Random Forest uses Decision Tree internally, so the implementation is same as Decision Tree (Classification)
- uses Gini Impurity or Entropy and Information Gain
Regression: As Random Forest uses Decision Tree internally, so the implementation is same as Decision Tree (Regression)
- uses Mean Squared Error (MSE)

So Random forest is good at
But bad for Interpretability which is far better in Decision Tree

Related Notes