TF-IDF

TF-IDF

TFβˆ’IDF(w)=countwordwΒ in a docΒ total # of words in a doclogTotal # of docs# of documents with word w

Pros

  1. Simple
  2. Efficient
  3. Effective in document retrieval

Limitations:

  1. All limitations of Count based Word Embeddings
  2. Bias towards rare tokens
  3. Sparse representation (high dimensional and Density Sparse Data)