XGBoost
- XGBoost stands for eXtreme Gradient Boosting
Steps:
- Get initial_guess, unlike Gradient Boosting, its always 0.5
- Start with one node which has all the residuals of the datapoint
- Get similarity score for that node
- Split the node
- Get similarity score for each leaf
- Calculate gain for that split,
- Go to 4 and continue splitting till predetermined number of depth is reached (usually 6)
- Prune the tree
- Calculate
for the lowest branch - If its negative remove the branch
- And continue go up till one has positive value
- Calculate
- Get
- Got to step 2, until a predetermined number of estimator is reached
Pros:
- Good at Handling Missing Data
- Performs well on dataset from small to large, complicated dataset
Cons:
- Bad at Handling Outliers
How XGBoost is good at Handling Missing Data
For each missing values, XGBoost push them to the default direction of the decision tree and learning the best direction during training.