TF-IDF
- TF-IDF is used to find the importance of a word in multiple documents
- TF = Term Frequency
- number of times the word is in a document
- IDF = Inverse Document Frequency
- how relevant that term is across all documents
- TF-IDF is the product of TF and IDF
TF-IDF
Pros
- Simple
- Efficient
- Effective in document retrieval
Limitations:
- All limitations of Count based Word Embeddings
- Bias towards rare tokens
- Sparse representation (high dimensional and Density Sparse Data)
- TF-IDF can be used as Word Embeddings also, by replacing
in One Hot Vector by the TF-IDF score.