TF-IDF

TF-IDF is used to find the importance of a word in multiple documents
TF = Term Frequency
- number of times the word is in a document
IDF = Inverse Document Frequency
- how relevant that term is across all documents
TF-IDF is the product of TF and IDF

TF-IDF

$T F (w, d) = \frac{c o u n t_{w o r d w} in doc d}{total # of words in doc d}$ $I D F (w, D) = l o g \frac{Total # of docs in corpus D}{# of documents with word w}$ $T F - I D F (w) = T F * I D F$

Why can't we use just one?

If we use just the TF, then think of the terms like "a", "the", they will get a very high importance whereas in reality there is no importance of those words
There comes the IDF part which normalize the TF work, if one word is on all documents, that means there is no importance of those words, so the IDF part will become 0 ( $l o g (N / N) = 0$ ), but if it is in 1 document, then that term becomes very important across all documents. In that case, the IDF will be also large ( $l o g (N / 1) = L A R G E$ ).
TF tells us how much popular is the term and IDF tells us how much unique is the term

Pros

Simple
Efficient
Effective in document retrieval

Limitations:

All limitations of Count based Word Embeddings
Bias towards rare tokens
Sparse representation (high dimensional and Density Sparse Data)

Applications:

Information Retrieval
Text Mining
Document Classification
Search Engines
Recommendation Systems

TF-IDF can be used as Word Embeddings also, by replacing $1$ in One Hot Vector by the TF-IDF score.

References:

https://www.youtube.com/watch?v=OymqCnh-APA