Encoder Only Transformer

#interview #nlp

In encoder-only model, they use only the encoder part of the transformer. The pre-training only includes masking out the sentence and the model has to predict the masked words.

These models only use the Self-Attention layer which has access to all the words in the inputs, thats why it is sometimes called as bi-directional encoder

Encoder only models are good for understanding sentences, semantics of the sentence like sentence classification, sentence embedding and so on.

The encoder only model's output layer is usually the final hidden states for each of the token.

Some examples of encoder-only transformers:

BERT
ALBERT
DistilBERT
RoBERTa
ELECTRA

References

Related Notes

GPT-OSS
Intrinsic Evaluation
Interview Resources
N-gram Method
Byte Level BPE