Word2Vec Embedding

#nlp #interview

Proposed in this paper
There are 2 methods proposed in the paper
1. Skip Gram Model
2. Continuous Bag of Words

Intuition

If two words have similar neighbors, then they should be similar words
Like "intelligent" and "smart" should have similar neighbors, and if they need to predict the same neighbors, both of the words should have the same features

Issues:

Single vector per word
1. Homonym or Polysemy words are represented with same vector
Not contextualized
Context window limitation
1. capturing only local information rather than global information
OOV words
Phase representation
The large vocabulary size in the softmax layer becomes a very big issue as the model has to predict all the probabilities even if there only one single target
1. The issue was solved by Hierarchical Softmax

References

Related Notes