BERT

BERT is a Encoder Only Transformer which was pre-trained with Next Sentence Prediction and Masked Language Modeling.

BERT-base -- 110M

BERT-large -- 340M


Related Notes