DistilBERT

DistilBERT is a distilled version of the BERT. It was created to train or do inference in a very low compute consumer GPU.

Changes:

Training Changes:


References


Related Notes