L1 vs. L2 Regression

The gradient of L1 or Lasso Regression is -1 or 1 except when it's 0. So it will penalty moves closer to 0 by the same increment, whatever the weight is.

On the other hand, for L2 or Ridge Regression the gradient is 2w, so the gradient depends on the weight and moves closer to 0 with smaller rate when the weight is close to 0; hence never making the weight totally 0.

So if we want to do Feature Selection or we need a sparse model, we use L1 or Lasso Regression, and if we want to reduce the magnitude of the weights, and spread them around the space, then we use L2 or Ridge Regression


Related Notes