- ELMo = Embeddings from Language Model
- Unlike other Word Embeddings, ELMo works on the character level
- Structure:
- A character level CNN is used to get each word raw vectors
- Those raw vectors are passed into two stacked Bidirectional RNN or LSTM
- Final ELMo embedding are then produced from weighted summary of the 2 intermediate layer output and the raw vector
- Like BERT Embeddings, ELMo is also a contextualized embedding
Pros:
- ELMo can understand Homonym or Polysemy
- ELMo can understand context of the sentence
- ELMo can understand OOV words too (Thanks to character level CNN)
References