Intuition
- If two words have similar neighbors, then they should be similar words
- Like "intelligent" and "smart" should have similar neighbors, and if they need to predict the same neighbors, both of the words should have the same features
Issues:
- Single vector per word
- Homonym or Polysemy words are represented with same vector
- Not contextualized
- Context window limitation
- capturing only local information rather than global information
- OOV words
- Phase representation
- The large vocabulary size in the softmax layer becomes a very big issue as the model has to predict all the probabilities even if there only one single target
- The issue was solved by Hierarchical Softmax
How Instagram Trains Word Embedding
- For a user, for each session, instagram thinks user has one specific thing in his mind, so whatever pic/video/reels he likes or watches should be similar.
- So Instagram treat each session as a sentence and the photos/reels as words and trains the model similarly
How Doordash Trains Word Embeddings
- Same as Instagram Doordash thinks that for each session, user has something in mind to eat, so whatever stores he checks out, should be similar
- So doordash treats each session as a sentence and the stores as the words
References
- https://jalammar.github.io/illustrated-word2vec/
- https://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/
- https://aman.ai/primers/ai/word-vectors/#count-based-techniques-tf-idf-and-bm25