Activation Function

In the neural layer with a weight and bias is simply can be defined as,

y=Wx+b
Why do we use an activation function?

Without the activation function, no matter how much layers we add, indeed all are just a linear regression model and fails to learn complex patterns. In deep learning, non-linear activation functions are mostly use as without the non-linearity all the layer becomes one linear combination of parameters.

Non-Linear Activation Functions

  1. Sigmoid Function
  2. Tanh
  3. ReLU
  4. Leaky ReLU
  5. Softmax
  6. Softplus
  7. Softsign

How to choose one over others?

  1. Zero-centricity - Fast convergence
    1. As it will have both positive and negative values which will help to converge
  2. Computational cost - Simple gradient
    1. Depends on how complex is the equation
  3. Gradient Anomalies - Vanishing Gradient, Exploding Gradient


References


Related Notes