Entropy and Information Gain

Entropy and Information Gain is used to calculate the information gain during splitting the tree into branches.

Entropy & Information Gain Formula

Entropy=ipilog2piIG=E(parent)P(child1)E(child1)P(child2)E(child2)

Advantages of Information Gain

  1. Works better when the classes are imbalanced
  2. Less sensitive to noise

Disadvantages of Information Fain

  1. Complex to calculate
  2. Less interpretable

Example

For the above example,
For the left tree,

P(+)=914P()=514E(Humidity)=P(+)log2P(+)P()log2P()P(+)=37P()=47E(Humidity=High)=P(+)log2P(+)P()log2P()P(+)=67P()=17E(Humidity==Normal)=P(+)log2P(+)P()log2P()Information Gain=E(Humidty)714E(Humidity==high)714E(Humidity==low)

Get the information gain in the same way for the right tree and then take the split with more information gain


Related Notes