Random Forest

Steps

  1. Create S different dataset from the raw dataset, len(s) = B
    1. They will be slightly different as they are taken randomly with replacement
    2. On each dataset take random subset of features rather than all features
  2. Learn S different decision trees
  3. Combine them for prediction
    1. For regression, take the average
    2. For classification, take the majority vote;.
Advantages of Random Forest

  1. As random sample is done, we reduce the effect of noise, outliers, and imbalanced dataset
  2. Less hyperparameters to tune
  3. No feature scaling is needed as it used Decision Tree internally

Disadvantages of Random Forest

  1. Hard to interpret


Related Notes