Tokenizer

Tokenizer is used to split the sentence into tokens and form a vocabulary from training corpus.

There are 3 types of tokenizers:

  1. Word Tokenizer
  2. Sub-word Tokenizer
  3. Character Tokenizer

References


Related Notes