Random Forest

Steps

  1. Create S different dataset from the raw dataset, len(s) = B
    1. They will be slightly different as they are taken random with replacement
    2. On each dataset take random subset of features rather than all features
  2. Learn S different decision trees
  3. Combine them for prediction
    1. For regression, take the average
    2. For classification, take the majority vote
Advantages of Random Forest

  1. As random sample is done, we reduce the effect of noise, outliers, and imbalanced dataset
  2. Less hyperparameters to tune
  3. No feature scaling is needed as it used Decision Tree internally

Disadvantages of Random Forest

  1. Hard to interpret


Related Notes