Self-Attention
This module is usually used in the Encoder Only Transformer. This can attend to all the text generated or not. This is also called bi-directional encoder. This is used to have a better representation of each token by attending to all the tokens.
To know the technical details, see Multi-Head Attention