BERT Embeddings

BERT = Bidirectional Encoder Representations from Transformer
BERT embeddings are called contextualized embeddings as they take the context from the sentence when they generate the embedding of the word of that sentence
- it means same word on different sentence with have different embedding
  - because usually they will have different context
It can capture global information (Thanks to Attention)
It takes word position into consideration when generating the embedding
- because same word has different meaning at start or end

Pros