Negative Sampling

Negative sampling was proposed by the word2vec authors, in their 2nd paper. From the Word2Vec Embedding, we can see that for only 10000 size vocabulary, the number of hidden layers need to update is 3M weights, which is very inefficient.

So the authors proposed that instead of updating weights for all of the negative words, they will sample K number of negative words, and update only K + 1 number of hidden layer weights (1 positive + K negatives).

Authors found out that for small dataset optimal value of K is 5-20 and for large datasets its 2-5 words


References

  1. https://mccormickml.com/2017/01/11/word2vec-tutorial-part-2-negative-sampling/

Related Notes