Self-Attention

#llm #nlp #interview

This module is usually used in the Encoder Only Transformer. This can attend to all the text generated or not. This is also called bi-directional encoder. This is used to have a better representation of each token by attending to all the tokens.

To know the technical details, see Multi-Head Attention

References

Related Notes

GloVe Embedding
Extrinsic Evaluation
Bag of Words
Word2Vec Embedding
Unigram Tokenization