Random Forest
Steps
- Create
S
different dataset from the raw dataset,len(s) = B
- They will be slightly different as they are taken randomly with replacement
- On each dataset take random subset of features rather than all features
- Learn
S
different decision trees - Combine them for prediction
- For regression, take the average
- For classification, take the majority vote;.
Advantages of Random Forest
- As random sample is done, we reduce the effect of noise, outliers, and imbalanced dataset
- Less hyperparameters to tune
- No feature scaling is needed as it used Decision Tree internally
Disadvantages of Random Forest
- Hard to interpret
- Classification: As Random Forest uses Decision Tree internally, so the implementation is same as Decision Tree (Classification)
- Regression: As Random Forest uses Decision Tree internally, so the implementation is same as Decision Tree (Regression)
- So Random forest is good at
- But bad for Interpretability which is far better in Decision Tree