Sub-sampling in Word2Vec

#deep-learning #nlp #interview

We have seen in the Word2Vec Embedding that we form input, output pair from the window size. But lets say, for the sentence: "a quick fox died", for the word "quick", there will be an input-output pair: (quick, a). But "a" doesn't give any new information for the context. It goes for any frequent words.

So the authors have sampled words from window based on the frequency. The more the frequency the less the probability that it will be chosen for the training corpus.

In this way, authors have decreased the size of the training corpus and faster the training without decreasing the accuracy.

References

https://mccormickml.com/2017/01/11/word2vec-tutorial-part-2-negative-sampling/

Related Notes

Skip Gram Model
Log-cosh Loss
Softsign
Gradient Clipping
Grid Search Hyperparameter Finding