Investigating Continual Pretraining in Large Language Models - Insights and Implications

Summary

Annotations

Annotation

« (i) when the sequence of domains shows semantic similarity, continual pretraining enables LLMs to better specialize in the current domain compared to stand-alone fine-tuning »()

Annotation

« (ii) training across a diverse range of domains enhances both backward and forward knowledge transfer »()

Annotation

« (iii) smaller models are particularly sensitive to continual pretraining, showing the most significant rates of both forgetting and learning. »()


Related Notes