Training a Deep Neural Network

Data: Get as many data as possible
Hidden Units: Better to have more hidden units than less, with less the model can be prone to Underfitting
Weights: Follow Weight Initialization
Activation Function: ReLU is the rule of thumb for hidden units, for output Sigmoid Function or Softmax, depending on the output type
Learning Rate: Try to use Learning Rate Scheduler or even low learning rate. Never high, because that will overshoot the convergence
Debug: Follow Debugging Deep Learning

Related Notes