Encoder Only Transformer
In encoder-only model, they use only the encoder part of the transformer. The pre-training only includes masking out the sentence and the model has to predict the masked words.
These models only use the attention layer which has access to all the words in the inputs, thats why it is sometimes called as bi-directional encoder.
Encoder only models are good for understanding sentences, semantics of the sentence like sentence classification, sentence embedding and so on.
Some examples of encoder-only transformers:
- BERT
- ALBERT
- DistilBERT
- RoBERTa
- ELECTRA