Word2Vec Embedding

#nlp #interview

Proposed in this paper
There are 2 methods proposed in the paper
1. Skip Gram Model
2. Continuous Bag of Words

Intuition

If two words have similar neighbors, then they should be similar words
Like "intelligent" and "smart" should have similar neighbors, and if they need to predict the same neighbors, both of the words should have the same features

Issues:

Single vector per word
1. Homonym or Polysemy words are represented with same vector
Not contextualized
Context window limitation
1. capturing only local information rather than global information
OOV words
Phase representation
The large vocabulary size in the softmax layer becomes a very big issue as the model has to predict all the probabilities even if there only one single target
1. The issue was solved by Hierarchical Softmax

How Instagram Trains Word Embedding

For a user, for each session, instagram thinks user has one specific thing in his mind, so whatever pic/video/reels he likes or watches should be similar.
So Instagram treat each session as a sentence and the photos/reels as words and trains the model similarly

How Doordash Trains Word Embeddings

Same as Instagram Doordash thinks that for each session, user has something in mind to eat, so whatever stores he checks out, should be similar
So doordash treats each session as a sentence and the stores as the words

References

Related Notes