BLEU Score

BLEU Score

BLEU-N=Brevity-Penaltyexp(Geometric Mean of Precision1...N)Brevity-Penalty=min(1,exp(1reference-lengthgeneration-length))Geometric Mean of Precision1...N=n=1Nwnlogprenpren=# of n-gram mathched in both target & generationtotal # of n-grams in the generation

Problems with BLEU Score

  1. Doesn't consider semantic meaning
  2. Doesn't consider synonyms
  3. Struggles with non-english language
  4. Hard to compare with different tokenizers
    1. as different tokenizer will break sentence in different parts; hence n-gram using tokenizer-A is not same as n-gram using tokenizer-B

Related Notes