Mixed Precision
Mixed precision is the best of both worlds, keeping some layers like (layer norm or softmax) to the full precision and some layers to half precision
Roughly on average it is calculated as 3 bytes per parameter, so for example, for a 7b model, it takes around 21GB for the model parameters.