Activation Function

#machine-learning #interview #activation-function

AKA transfer Function

In the neural layer with a weight and bias is simply can be defined as,

y = W x + b

Why do we use an activation function?

Without the activation function, no matter how much layers we add, indeed all are just a linear regression model and fails to learn complex patterns. In deep learning, non-linear activation functions are mostly use as without the non-linearity all the layer becomes one linear combination of parameters.

Non-Linear Activation Functions

Sigmoid Function
Tanh
ReLU
Leaky ReLU
Softmax
Softplus
Softsign

How to choose one over others?

Zero-centricity - Fast convergence
1. As it will have both positive and negative values which will help to converge
Computational cost - Simple gradient
1. Depends on how complex is the equation
Gradient Anomalies - Vanishing Gradient, Exploding Gradient

References

What, Why and Which?? Activation Functions
Fundamentals of Deep Learning – Activation Functions and When to Use Them?
Everything you need to know about “Activation Functions” in Deep learning models
Activation Functions in Neural Networks
Activation Functions Explained - GELU, SELU, ELU, ReLU and more
How Activation Functions Work in Deep Learning
Which activation function suits better to your Deep Learning scenario?
https://arxiv.org/abs/2010.09458
https://paperswithcode.com/methods/category/activation-functions

Related Notes

Elastic Net Regression
Splitting tree in Decision Tree
Dying ReLU
Cross Entropy
Surprise