- BERT = Bidirectional Encoder Representations from Transformer
- BERT embeddings are called contextualized embeddings as they take the context from the sentence when they generate the embedding of the word of that sentence
- it means same word on different sentence with have different embedding
- because usually they will have different context
- It can capture global information (Thanks to Attention)
- It takes word position into consideration when generating the embedding
- because same word has different meaning at start or end
Pros
- Contextualized embedding
- Can understand Homonym or Polysemy words
- Can work with rare or even OOV words (Thanks to WordPiece Tokenization)
References