Encoder-Decoder Transformer

The encoder-decoder transformer is a combination of Encoder and Decoder model. The only addition is in the decoder in addition to the Masked Self-Attention, it also uses Cross-Attention.

This was the model that was at first introduced in the transformer paper, later BERT has extended it to encoder only model and GPT-2 has extended it to decoder only model.

This model is also called seq-to-seq model as this model can handle sequence as an input, like in machine translation task where we have to input a sequence to get the target sequence.


References


Related Notes