Encoder Only Transformer

#interview #nlp

In encoder-only model, they use only the encoder part of the transformer. The pre-training only includes masking out the sentence and the model has to predict the masked words.

These models only use the attention layer which has access to all the words in the inputs, thats why it is sometimes called as bi-directional encoder.

Encoder only models are good for understanding sentences, semantics of the sentence like sentence classification, sentence embedding and so on.

Some examples of encoder-only transformers:

BERT
ALBERT
DistilBERT
RoBERTa
ELECTRA

References

Related Notes

Likelihood
Area Under Precision Recall Curve (AUPRC)
Plots Compared
Support Vector Machine (SVM)
Tanh