Jaccard Similarity

Binary Vector

For two binary vector, the formula is,

Jaccard Similarity = # elements where both are 1# elements where both are 1+# elements where only first one is 1+# elements where only second one is 1

Set

For two sets A,B, the formula is,

Jaccard Similarity = |AB||AB|

Multiset

For two Multiset A,B, the formula is,

Jaccard Similarity = |AB||A|+|B|

Uses:

  1. Text Mining: To find similarity between two text documents using the term user
  2. E-Commerce: from millions of dataset, find similar customer from their purchase history
  3. Recommendation System: In movie recommendation system, we can use the jaccard index to find the similar users by their watch history

References


Related Notes