Decision Tree (Classification)
Steps:
- Try with each remaining feature and find out impurity for each candidate tree
- There are multiple ways to calculate impurity
- If the feature is categorical, just use the labels for branch
- If the feature is continuous,
- Sort data based on that column
- For each pair of data, find the mean
- And get the node defined on that mean, i.e.,
age <= 7
, here 7 is mean of two consecutive rows
- Take the candidate tree with lowest impurity
- If there is no impure node left or the pre-defined depth limit reached, STOP.
- Go to step 1
Problem:
- Overfitting
References
- Decision and Classification Trees, Clearly Explained!!!