Gumbel Softmax
- Proposed in this paper
Gumbel Softmax is used mostly in the models where we need to sample discrete variables and also need to back-propagate through them. If we do usual decoding process, we would have used the argmax
, which is not differentiable so we can't back-propagate through it.
The idea of the gumbel softmax is to add some noise which is sampled from
As we can see from the fig,
- For low temperature (
), gumbel softmax distribution gives One Hot Vector like output - On the other side, for high temperature, it gives an Uniform Distribution
Gumbel softmax is differentiable because it is continuous for all values >= 0