Gradient Descent

#machine-learning #interview

Gradient descent finds the Derivative or slope of loss ( $\frac{\partial l o s s}{\partial w}$ ) on a point
Make it 0 to minimize it
Then update the weight by this formula

w_{n e w} = w_{o l d} - α * \frac{\partial l o s s}{\partial w}

When the point is far from minimum, it gives big loss values
When the point is closer to minimum, it gives small loss values
Gradient descent can be stuck to local minima
One solution is to randomly initialize and run again

Pros:

Simple to implement
Can work well with well tuned learning rate

Cons:

Can be very slow especially for complex model or large dataset
It requires large memory
Computationally inefficient
Sensitive to the choice of learning rate
May trap to local minima like we saw before

Related Notes