Training a Deep Neural Network
- Data: Get as many data as possible
- Hidden Units: Better to have more hidden units than less, with less the model can be prone to Underfitting
- Weights: Follow Weight Initialization
- Activation Function: ReLU is the rule of thumb for hidden units, for output Sigmoid Function or Softmax, depending on the output type
- Learning Rate: Try to use Learning Rate Scheduler or even low learning rate. Never high, because that will overshoot the convergence
- Debug: Follow Debugging Deep Learning