Word Tokenizer
In word tokenizer, every sentence or data is split into tokens by space. It is also known as space tokenizer.
Example:
Sentence: the low you go, the lower you find yourself.
Tokens: [the, low, you, go, the, lower, you, find, yourself]
Cons:
- In the previous example, "low" and "lower" is thought to be totally different words, but there are relation between them
- In test time, if "high" and "mid" these words come, both of them are thought to be same "OOV" word but they have different meaning.