Decoder Only Transformer

In decoder-only transformer, it uses the decoder part of the transformer where it can only attend to the previous words of the sentences.

The pre-training only uses the Causal Language Modeling.

These models are best suited for text-generation.

Some examples of decoder-only models:

References